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Resume 



Un bon nombre d'experiences recentes en biologie mesurent des systemes com- 
poses de plusieurs composants en interactions, comme par exemple les reseaux de 
neurones. Normalement, on a experimentalement acces qu'au comportement collec- 
tif du systeme, meme si on s'interesse souvent a la caracterisation des interactions 
entre ses differentes composants. Cette these a pour but d'extraire des informations 
sur les interactions microscopiques du systeme a partir de son comportement collectif 
dans deux cas distincts. Premierement, on etudie un systeme decrit par un modele 
d'Ising plus general. On trouve des formules explicites pour les couplages en fonction 
des correlations et magnetisations. Ensuite, on s'interesse a un systeme decrit par 
un modele de Hopfield. Dans ce cas, on obtient non seulement une formule explicite 
pour inferer les patterns, mais aussi un resultat qui permet d'estimer le nombre de 
mesures necessaires pour avoir une inference precise. 
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Abstract 



Several recent experiments in biology study systems composed of several inter- 
acting elements, for example neuron networks. Normally, measurements describe 
only the collective behavior of the system, even if in most cases we would like to 
characterize how its different parts interact. The goal of this thesis is to extract infor- 
mation about the microscopic interactions as a function of their collective behavior 
for two different cases. First, we will study a system described by a generalized Ising 
model. We find explicit formulas for the couplings as a function of the correlations 
and magnetizations. In the following, we will study a system described by a Hopfield 
model. In this case, we find not only explicit formula for inferring the patterns, but 
also an analytical result that allows one to estimate how much data is necessary for 
a good inference. 
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Chapter 1 

Biological motivation and related 
models 

In the last years, we have seen a remarkable growth in the number of experiments 
in biology that generate an overwhelming quantity of data. In several cases, like in 
neuron assemblies, proteins and gene networks, most of the data analysis focuses 
on identifying correlations between different parts of the system. Unfortunately, 
identifying the correlations on their own is only of limited scientific value: most of 
the underlying properties of the system can only be understood by describing the 
interaction between their different parts. This work finds its place in developing 
statistical mechanics tools to derive these interactions from measured correlations. 

In this introductory chapter, we present two biological problems that inspired 
this thesis. First, in section 1.1 we give a brief introduction to neurons and how 
they exchange information in a network. We discuss some experiments where the 
individual activity of up to a hundred interacting neurons is measured. For this 
example, the neurons are the interacting parts and they interact via synapses, whose 
details are very hard to extract experimentally. 

In a second part, we discuss some recent works on the analysis of families of 
homologous proteins, i. e., proteins that share an evolutionary ancestry and function. 
The variation of the amino acids inside these families are highly correlated which is 
deeply related to the biological function of the proteins. In general terms, we can 
say thus that individual amino acid variations play the role of interacting parts with 
very complicated interactions, as we will see in section 1.2. 

1.1 Neuron networks 

One of the most important scientific questions of the 21st century is the un- 
derstanding of the brain. It is widely accepted that its complexity is due to the 
organization of neurons in complex networks. If we consider, for example, the hu- 
man brain, we can count about 10 11 neurons connected by about 10 14 connections. 
Even much simpler organisms like the Drosophila melanogaster fruit fly counts about 
100,000 neurons. 

A typical neuron can be schematized as a cell composed of three parts: the cell 
body, dendrites and one axon (see Fig. 1.1). A dendrite is composed of several 
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branches in a tree-like structure and is responsible for receiving electric signals from 
other neurons. The axon is a longer, ramified single filament, responsible for sending 
electrical signals to other neurons. A connexion between two neurons in most cases 
happens between an axon and a dendrite 1 . We call such connections synapses. 



axon (less than 1 mm to 
more than 1 m in length) 




axon make synapses on 
target cells 

Figure 1.1: Schema of a neuron [Alberts 02]. The diameter of the cell body is typically 
of the order of 10 fim, while the length of dendrites and axons varies considerably with the 
neuron's function. 



Like most cells, neurons have an electrical potential difference between their 
cytoplasm and the extracellular medium. This potential difference is regulated by 
the exchange of ions (such as Na + and K + ) through the cell membrane, which can 
be done in two ways: passively, by proteins called ion channels that selectively allow 
the passage of a certain ion from the most concentrated medium to the least and, 
conversely, actively by proteins called ion pumps that consume energy to increase 
the ion concentration difference. 

A typical neuron has a voltage difference of about —70 mV when it is not receiving 
any signal from other neurons. We call this voltage the resting potential of the 
neuron. If the voltage of a neuron reaches a threshold (typically about — 50 mV), 
a feedback mechanism makes ion channels of the membrane to open, making the 
voltage increase rapidly up to 100 mV (depending on the neuron type), after which 
it reaches saturation and decreases quickly, recovering the resting potential after 
a few ms (see Fig. 1.2). We call this process firing or spiking. One important 
characteristic of the spikes is that once the voltage reaches the threshold, its shape 
and its intensity do not depend on the details of how the threshold was attained. 



*As common in biology, such simplified description of a neuron and synapses has exceptions. 
Some axons transmit signals while some dendrites receive them. We also find axon-axon and 
dendrite-dendrite synapses [Churchland 89]. 
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Time (ms) 



Figure 1.2: Typical voltage as a function of time graph for a firing neuron [Naundorf 06]. 



When a neuron fires, its axon releases neurotransmitters at every synapse. Those 
neurotransmitters make ions channels in the dendrites open, changing the membrane 
potential of the neighboring neurons. Different neurotransmitters cause the opening 
of different ion channels, allowing for both excitatory synapses, which increase the 
neuron potential, and inhibitory synapses which decrease it. Since synapses can be 
excitatory or inhibitory to different degrees, most models define a synaptic weight 
with the convention that excitatory synapses have a positive synaptic weight and 
inhibitory synapses have a negative one, as we will see in section 1.1.2. Another 
important feature of synapses is that they are directional: if a neuron A can excite 
a neuron B, the converse is not necessarily true: neuron B might inhibit neuron A, 
or simply not be connected to it at all. 



1.1.1 Multi-neuron recording experiments 

While much progress has been done in describing individual neurons, understand- 
ing their complex interaction in a network is still an unsolved problem. One of the 
most promising advances in this area was the development of techniques for recording 
simultaneously the electrical activity of several cells individually [Meister 94] . 

In these experiments, a microarray counting as many as 250 electrodes is placed 
in contact with the brain tissue. The potential of each electrode is recorded for 
up to a few hours. Each one of the electrodes might be affected by the activity 
of more than one neuron and, conversely, a single neuron might affect more than 
one electrode. Thus, a computational-intensive calculation is needed to factorize 
the signal as the sum of the influence of several different neurons. This procedure 
is known as Spike Sorting [Peyrache 09] and its results are spike trains, i.e., time 
sequences of the state of each cell: firing or at rest. An example of a set of spike 
trains can be seen in Fig. 1.3. 
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Figure 1.3: Typical measurement of spike trains [Peyrache 09]. Each line corresponds to 
a single neuron. Black vertical bars correspond to spikes. 



In principle, one should be able to pinpoint the synapses and the synaptic weights 
from the spike trains. However, extracting this information is a considerable chal- 
lenge. First of all, it is not possible with current technology to measure every neuron 
of a network. Thus, experiments measure just a small fraction of the system even 
if the network is small. That means that all that we can expect to find are effec- 
tive interactions that depend on all links between the cells that are not measured. 
Secondly, one can not naively state that if the activity of two neurons is correlated 
then they are connected by a synapse. Consider, for example, neurons 7, 22 and 27 
of Fig. 1.3, indicated by the red arrows. We can clearly see that there is a tendency 
for all three of firing at the same time, but distinguishing between the two possible 
connections shown in Fig. 1.4 is not trivial. 




Figure 1.4: Two different possible configurations for three positively correlated neurons. 



1.1.2 Models for neuron networks 

Before talking about what have already been done to solve this problem and our 
contribution to it, we present some models for neural networks. We will proceed by 
first introducing a model that describes rather faithfully real biological networks, the 
leaky Integrate-and-Fire model. Afterwards, we introduce the Ising model, which is 
much more tractable analytically. Finally, we will look at the Ising model from a 
different point of view by studying one particular case of it: the Hopfield model. 
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Leaky integrate-and-fire model 

The leaky integrate-and-fire model, first proposed by Lapicque in 1907 [Lapicque 07, 
Abbott 99,Gerstner 02, Burkitt 06], is a straightforward modelization of the firing 
process presented in section 1.1. It supposes that neurons behave like capacitors 
with a small leakage term to account for the fact that the membrane is not a per- 
fect insulator. Posing V(t) as the function representing the difference of potential 
between the inside and the outside of the membrane: 

where C is the capacitance of the neuron, R is the resistance of the cell membrane 
and I(t) is the total current due to the synapses of neighboring neurons. If we 
introduce the characteristic time of leakage r = RC, we can rewrite this equation as 

r^ = -V(t) + RI(t). (1.2) 

Spikes are modelized solely by their "firing time" tf. This firing time is defined 
as the moment where the neuron's potential reaches a firing threshold value V tr . 
Implicitly, it is given by the equation 

V(tf) = V tr . (1.3) 

Every time a neuron spikes, its potential is reset to zero and its synapses produce 
a signal in the form of some function f(t). We can thus describe the signal S(t) 
that this neuron send to its neighbors as a function only of the set of firing times 

\P f J T Ivt^spikcs 

-^spikes 

S(t)= £ f(t-t}). (1.4) 

r=l 

The choice of the function f(t) can be based on biological measures, mimicking 
the behavior shown in Fig. 1.2 or can be a simple Dirac-delta function to make 
calculations easier. 

Finally, we can introduce the synaptic weights to model a neuron network with 
the following equations: 

= -m+Y^JMt), (i.5) 

3 

Viitf) = 1, (1.6) 

Spikes 



Si(t) = J2 /(*"#"), (1-7) 



r=l 



where V$ is the potential of the neuron % rescaled so that V tr = 1, is the time of 
the r-th spike of the neuron % and is the matrix of the synaptic weights. Note 
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that it is usually assumed as an approximation that the function f(t) is identical for 
all neurons. 

This model is very popular due to its balance between biological accuracy and 
relative simplicity. It is also very well-suited for computer simulations by the direct 
integration of its differential equations. On the other hand, while some numerical 
work has been done on the inference of synaptic weights from spike trains using this 
model [Cocco 09], it is not very practical for analytical results. 

Generalized Ising model / Boltzmann Machine 

The Boltzmann Machine [McCulloch 43] is a model of neuron networks that 
mimics less well real biological systems than the leaky Integrate-and-Fire model. 
It is however considerably simpler, being even exactly solvable for some special 
networks. In this model, the state of a neuron is fully described by a spin variable 
a = ±1 with the convention of a — +1 if the neuron is firing and a — — 1 if it is 
not 2 . The dynamics of the system is ignored 3 and we describe only the probability 
P({o"i, . . . , o"at}) of finding the network of N neurons in a state {a±, <t/v}, which 
is given by the Boltzmann weight of a generalized Ising model 

P({a l7 ...,a N }) = I e -«^-^}), (1.8) 

with 

Z = J2e~ m{ai - aN}) , (1.9) 
M 

where Z is the partition function of the model, (5 is a parameter of the model, that 
in the context of spins represents the inverse temperature and we introduced the 
notation 

E= E E - E • (""J 

{a} eri=±l <T2=±1 <Tjv=±l 

The Hamiltonian should take into account the connection between neurons and 
the fact that some minimum input is needed for the neuron to reach the threshold 
and fire. The widely used expression is 

H({a u ...,a N }) = -^^JijViVj -^hidi, (1.11) 

where corresponds to the synaptic weight and hi is a term that models the 
threshold as a "field" favoring the neuron to be in the rest position. 

Two features of this model are particularly pertinent for what follows. First, it is 
directly defined in the language of statistical physics and allows the use of its frame- 
work with no additional complications. Secondly, if one measures the averages (<7j) 
and the correlations (<7i<7j) of a spike train, the Boltzmann machine arises naturally 
as a model consistent with these measurements, as we will discuss in more detail 

2 The convention of a = 1 for a firing neuron and a = for resting is also common. 

3 It is possible to define a time evolution in this model using the Glauber dynamics if needed. 
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in section 3.1. On the other hand, a significant shortcoming of this model is that 
synapses are symmetric, i. e., Jy = Jj iy which is not necessarily true in biological 
systems. 

It is important to note that the Hamiltonian shown in Eq. (1.11) can give rise 
to a rich diversity of behaviors depending on the choice of J if ferromagnetism, 
frustration, glassy systems, etc, as we will see in chapter 2. 

Particular case: Hopfield model 

Until now, we have presented models for neurons in completely arbitrary neuron 
networks. In this section we will describe a model that uses the same modelization 
for neurons we presented in the last section but restricts the synaptic weights Jij to 
a particular form: 

•^-E^'- (1.12) 

/x=l 

where £f are real values that we will discuss in the following. This particular case of 
the generalized Ising model is called the Hopfield model and was proposed to describe 
a system that stores a given number p of memories and is capable to retrieve them 
when given a suitable input. The form shown in Eq. (1.12) was chosen so that the 
Hamiltonian we saw in Eq. (1.11) can be rewritten as 

h =-^£,(y,$°) (i.i3) 

a=l \ i / 

which has the property that <7j = sign(£f ) is a local energy minimum for every a if 
p <^ N and the patterns are more or less orthogonal. 

The interpretation of this model as a model for associative memory comes from 
the fact that, under certain conditions, if our system has an initial configuration 
similar to one of the vectors £ M it will evolve to the configuration oi = sign(£f). In 
this context, we normally call the p vectors j^ 1 , ...,£ p } memories (or patterns). More 
rigorously, we will see in section 2.4 that in the limit N >> 1, the system can retrieve 
up to a c N stored binary patterns with a c ~ 0.138. 

The Hopfield model can also be seen as an approximation of the general Boltz- 
mann Machine for a finite-rank J matrix. Indeed, let's write the eigenvector decom- 
position of the matrix J, 

N 

J ij = ^2^aVa,iVa,j , (1-14) 
a=l 

with {\ a } and {v a i ) being respectively the eigenvalues and eigenvectors of the ma- 
trix J. If we truncate this summation up to the first p highest eigenvalues and pose 
£f = y/K*Va,i, we find exactly the same equation as Eq. (1.13). On the other hand, 
the limit of p = N does not make this approximation exact, since Eq. (1.13) cannot 
account for negative eigenvalues of the matrix J^. 

This model got a renewed interest when experimentalists started looking for 
patterns in spike train recording data. A recent experiment with rats made by 
Peyrache et al. [Peyrache 09] compared the spike activity of neurons in two different 
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moments: when the rat was looking for food in a maze and when it was sleeping. 
The main statistical tool used by the authors was the Principal Component Analysis 
(PCA), i.e., finding the eigenvalues and eigenvectors of the correlation matrix of the 
measured neuron activity. They showed that the eigenvectors that were the most 
strongly correlated with neuron activity when the rat was choosing a direction in 
the maze were revisited during his next sleep. The authors interpreted this finding 
as the well-known process of memory consolidation during sleep. In part III, we will 
show that the author's proceeding of extracting patterns from neural data using the 
PCA is closely related to fitting spike trains with a Hopfield model. 



1.2 Homologous proteins 

We say that two different proteins are homologous if they have both a common 
evolutionary origin [Reeck 87] and a similar sequence, which normally also imply a 
similar function. The comparison of the proteins of a homologous group gives some 
valuable insight of which features are really essential for their biological function. 

YES_XIPHE MGCvrSKEaKgPAlKYqpdNsnvvPvSahlgHYGpeptimg 

YES_AVISY dKgPAmKYrtdNtp-ePiSshvsHYGsdssqat 

YES_CHICK MGCikSKEdKgPAmKYrtdNtp-ePiSshvsHYGsdssqat 

YES_HUMAN MGCi kSKEnKsPAiKYrpeNtp-ePvS tsvsHYGaept tvs 

YES MOUSE MGCikSKEnKsPAiKYtpeNlt-ePvSpsasHYGvehatva 



Figure 1.5: A set of 41 sequences containing SH2 domain. Each line correspond to a 
different protein and each letter correspond to an amino acid, with conserved ones in bold. 
These sequences were matched using a multiple sequence alignment software [Edgar 04]. 

The first step while comparing two or more homologous proteins is to align their 
sequences in a way that maximizes the number of identical basis (see Fig. 1.5). This 
procedure is known as Multiple Sequence Alignment (MSA) [Lockless 99]. Since 
during evolution there could have been insertion or deletion of basis, the optimal 
alignment will involve adding empty spaces to the alignment in an optimal way, what 
makes the MSA problem NP-complete, i.e., solving it needs a number of operations 
that grows exponentially with the number of sequences. 

It is natural to suppose that the most important parts of a protein should vary 
significantly less than the least important ones, since most mutations in important 
parts yield non-functional proteins. Consequently, the most straightforward analysis 
one can do with aligned sequences is to evaluate how the distribution of amino acids 
in a given position deviates from a random uniform distribution [Capra 07]. 

While considering each position separately was proved to be useful for identi- 
fying functional groups, a much richer behavior was found by considering pairwise 
correlation between sites. First of all it has been shown that by taking into ac- 
count both conservation and correlation one can describe more accurately which 
sites of proteins are essential for its function than by just considering conservation 
alone [Lichtarge 96]. 
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Secondly, a remarkable experiment by Russ et al. [Russ 05] created artificial 
proteins by randomly picking amino acids with a probability distribution that re- 
produced the averages and the pairwise correlations of a group of homologous natural 
proteins. He showed that these new proteins fold to a native tertiary structure simi- 
lar to that of the natural proteins of the group. Conversely, he showed that random 
proteins that were generated without taking into account correlations do not fold 
into a well-defined three-dimensional structure, an essential step for a protein to be 
functional. 

Moreover, an interesting paper by Halabi et al. [Halabi 09] showed that the 
correlation matrix has a particular structure: the amino acids can be separated in 
disjoint groups (or sectors) that are only correlated to other amino acids inside the 
same sector. Each sector has a distinct functional role and has evolved practically 
independently from the others. 

Finally, studying the two-basis correlations was shown to be a very good way to 
infer which pairs of amino acids are spatially close in the three-dimensional structure 
of the protein [Burger 10]. Yet, some non-trivial work is needed to know if two 
basis are correlated because they are spatially close one to another or because they 
are spatially close to a third base, a problem very similar to the one presented 
in section 1.1.1 for neurons. To solve such a problem, a paper published in 2009 
[Weigt 09] proposed a very simplified model to describe a family of proteins composed 
by N amino acids: it supposes that the proteins that constitute the family are 
randomly chosen among all possible proteins with length N and that the probability 
of a given protein is given by: 



where G {1,. . . ,22} describe the i-th amino acid of the protein and Jij(Ai,Aj) 
and hi(Ai) are real- valued functions. This modelization is very similar to the Ising 
model we saw above and reduces the problem of finding which basis are actually 
close in the three-dimensional structure of the protein to the problem of finding 
which functions Jy and hi of the Hamiltonian best describe a set of measured two- 
site correlations. 

To sum up, in the same way we saw in section 1.1 for neurons, we are dealing 
with a large number of correlated data where the pairwise correlation plays a special 
role. While for neurons we wanted to infer a synaptic network, in this case we would 
be interested in extracting an expression for the effective fitness of the proteins of 
the group, i.e., a quantity that would say how well a protein performs its biological 
role as a function of its amino acids sequence. 
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Chapter 2 



Some classical results on Ising-like 
models 

As we have seen in chapter 1, Ising-like models are modelizations of neural net- 
works which are particularly suitable for analytical calculations. In this chapter, 
we present some classical results for some of these models, such as the Sherrington- 
Kirkpatrick and the Hopfield model. Since normally most of the behavior of the 
system can be deduced from the partition function Z, it is normally said that a 
model is "solved" when one evaluates this quantity explicitly. We start by reminding 
some results for the Ising model as it was originally defined. In the sequence, we will 
present for both the Hopfield and the Sherrington-Kirkpatrick models the procedure 
for evaluating Z in general lines, since it will be useful later in chapter 6. Indeed, 
as we will do similar calculations, the comparison with these classical results will be 
enlightening. 

2.1 The Ising model 

The original Ising Model was proposed by Wilhelm Lenz and first studied by 
Lenz's PhD student Ernst Ising as a simple model for ferromagnetism and phase 
transitions. This model supposes that the atoms of a magnet are arranged in a 
lattice and the spin of each atom % is described by a binary variable Oi — ±1. In 
addition, it assume that each atom interacts only with its closest neighbors, so we 
can write the energy of the system as 

H = -J^2 Wj + h^ai, (2.1) 

<i,j> i 

where J is the energy of the interaction between neighbors, favoring spins to be 
aligned and h corresponds to an external magnetic field. The notation • > means 
summing over all the pairs i,j where % and j are closest neighbors. 

We suppose that the probability of the different states of the system is given by 
the Boltzmann distribution 

P(K,...,^}) = I e -^««.-^W , (2.2) 
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with 

Z = J2e~ m{ai '-' aN}) , (2.3) 
M 

where j3 = fee is the Boltzmann constant and T is the temperature. In the 
following, unless explicitly stated, we will absorb the constant (3 in the Hamiltonian 
to make notations lighter, but we might still use the terms "high temperature" and 
"low temperature" to refer to the magnitude of J and h in temperature units. The 
thermal average of a quantity /({<7i, •••Cat}) is given by 

(f({a 1: ...a N })) = \Y. -M)e-"« CTl '-'^» • (2.4) 

M 

The concept of "closest neighbors" depends both on the form of the lattice and 
its dimension. The one-dimensional case, where spins are arranged on a line, was 
solved right after the model was proposed and shown to present no phase transitions. 
With a brief calculation [Le Bellac 02], one can also find the two-site correlation in 
the h = case, 

(<T i( Tj) = (tanhJ) 1 ^' 1 . (2.5) 

In two dimensions, the Ising model was solved after a mathematical tour de 
force [Onsager 44] and shown to have a second-order phase transition that separates 
a ferromagnetic phase (where magnetizations - given by m — (<7j) - are non zero) 
from a paramagnetic phase of zero magnetization. 

Another case that shows a phase transition is the infinite dimension limit of the 
model, where the lattice is a complete graph, i.e., each spin is neighbor of every 
other one. In this case the Hamiltonian is given by 

H = ~^2<T i <T j -h^2<T i , (2.6) 

i < j i 

where we did a rescaling of J — > J/N to keep the Hamiltonian extensive. It is 
a classical calculation to show that in this case the magnetization is given by the 
implicit equation 

m = tanh( Jm + h) , (2.7) 

which presents a ferromagnetic/paramagnetic phase transition on J = 1. We can 
also obtain the connected correlation of the model: 

W-WM'iy i-VV (2 ' 8) 

Besides the different choices of lattice, there are several possible generalizations 
of the model expressed by small changes in the Hamiltonian (2.1). For example, on 
can add interactions between three sites with a term JYlijk <T i <J j a k- Of particular 
interest for this work is the generalization of the lattice by defining arbitrary two-site 
interactions and making the external field site-dependant: 

h = - ^2 j ij a i a j - ^2 hiCTi ' ( 2 - 9 ) 

i<j i 
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as we have already seen in section 1.1.2. In this case, Ising models are also a 
privileged ground for the modeling of disordered systems for two main reasons: first, 
they are specially convenient for obtaining exact results and, secondly, the Ising 
model and all its generalizations are particularly suitable for computer simulations 
using Monte-Carlo methods [Krauth 06], with systems of up to a thousand spins 
being tractable. 

We will now look at some particular cases. 



2.2 Sherrington-Kirkpatrick model 

The Sherrington-Kirkpatrick model (or SK model) is a simplified model of dis- 
ordered systems [Sherrington 75]. In this model, the Hamiltonian is given by 



i<j 



where Jo corresponds to a ferromagnetic component of the system. Each is chosen 
randomly with a Gaussian distribution 

PW) = vh e ~ A ■ (2 ' u) 

where J represents the typical magnitude of couplings. To work with intensive 
quantities, we pose J = J/y/N, J being 0(1). We will denote the average of a value 
/ according to the distribution of J^- by /, not to confound with the thermal average 

(/)■ 

In the following we will look into the technical details of the solution of this 
model for two reasons. First, there are some interesting concepts that emerge and 
secondly, we will do a similar calculation in part III. 



2.2.1 Replica solution of the SK model 

As we discussed in the beginning of the chapter, to solve this model we need 
to evaluate the free-energy F = — log Z, a quantity that depends on the particular 
sampling of Jy. Since the free-energy is extensive, we expect it to be self-averaging, 
i. e., to converge to its average value in respect to when one increases the size 
of the system. We would like thus to evaluate log Z to find the typical behavior of 
the system. Since evaluating Z n is much easier than evaluating log Z, we will first 
evaluate Z n for every integer n and consider that 

£n _ J 

logZ = lim . (2.12) 

n->-0 n 

This procedure, known as the replica trick [Edwards 75], is useful for correctly solving 
several statistical mechanics problems but is not mathematically rigorous: the limit 
depends on the behavior of Z n for n < 1 which is not unambiguously defined as an 
analytic continuation of the integer values of Z n . 
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Initially, we have 



^exp 

M 



EE(/« + £V>5 

a=l i<j ^ ' 



(2.13) 



which is just the partition function of n identical, non-interacting copies of the 
system. Evaluating its average, we obtain 



= e 



EE^^E^eW) 

a=l i<j i<j \ a=l / 



J^exp 



2 "/ 4 ^exp 
M 



a=l i<j 

Jo 
2N 



2^ 2^ a7 ^ ^ + ^ J 2^ a * a j a < a j 

i<j l<a<j<n 



J 



E E< +t E E«w 



!<«<7<n \ i 



(2.14) 



where in the last passage we neglected a term subdominant in N. 
Using an integral transform, Eq. (2.14) can be written as 



^ e N 2 J 2 n/4 f TT d ?a7 TT u,/t a 



(2.15) 



l<o<7<n 



where [/ is given by 
U = 



l<a<7<n 



a=l 



i a i 



(2.16) 



l<a<7<n i 

Note that with this writing the sites are decoupled. Consequently we have 



E^ 

M 



u 



J^exp 

M 



1 1 n 

2 H i^E™" 

l<o<7<n a=l 



a l<a<7<n 



A? 



(2.17) 



Finally we could in principle evaluate the integrals in Eq. (2.15) using the saddle- 
point approximation. Yet, finding the set of q ai and m a that constitute the saddle 
point for an arbitrary n is non trivial. 

To find the maximum of Eq. (2.17), one classically assumes that all the different 
copies of the system have identical statistical properties. This is known as the 
replica symmetric ansatz. Mathematically, it corresponds to setting q ai = q and 
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m a = m. This hypothesis can be shown to yield a good approximation of the free- 
energy and to correctly find the phase diagram of the model (Fig. 2.1), but for very 
low temperatures it yields a negative entropy and hence this supposition is clearly 
unjustified in this regime. 



1/J 
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1 1 1 
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Jo/J 



Figure 2.1: Phase diagram of the Sherrington-Kirkpatrick model [Sherrington 75]. 

The parameters m and q have straightforward physical meanings in the replica- 
symmetric case: m = (<7j), which means that if m ^ 0, the system has a preferred 
magnetization that does not vanish after averaging with respect to the disorder. We 
say the system is in a ferromagnetic phase. The other parameter q can be written as 

q = jj Yli ( a i) ■ When m = and q ^ 0, the system has a non-zero magnetization 
for a giving sampling of J^, but this magnetization vanishes when averaging with 
respect to Jij. In this case, we say our system is in a spin glass phase, where the 
system is frozen in one of the several (random) local minima of the energy. Finally, 
the case q = m = correspond to the paramagnetic phase. 

The correct saddle-point of equation (2.17) was found in the late 70's by G. 
Parisi by defining the value of the matrix q ai at the saddle-point through an iterative 
procedure. Note that in the general case, q ay is the overlap between the replicas a 
and 7: 

^ = ^J2 a >]- ( 2 - 18 ) 

i 

His solution have a very interesting property: if we consider any three replicas a, 
7 and p and their overlaps q ai , q lp and q ap we will have two identical overlaps and 
one that is strictly larger than the other two. We can represent the replicas as the 
leaves of a three where the length of the path from one leaf to another is the overlap 
between the replicas (see Fig. 2.2). This distance defines an ultrametric structure 
for the replicas. More precisely, we say that a metric space is ultrametric if for any 
three points x, y, z we have d(x, z) < max{d(x, y), d(y, z)} [Rammal 86]. 
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The details of the Parisi solution can be found on [Parisi 80]. 




Figure 2.2: Topology of the distance between the different replicas for the Parisi solution. 



2.3 TAP Equations 

We will now present another way of solving the SK model that remain correct 
in the low-temperature regime: the TAP equations. This solution is of particular 
interest to this work since it shares some common points with our procedure for 
solving the inverse Ising model presented in part II. In this section, we will derive 
these results following the work of Georges and Yedidia [Georges 91], since this 
formulation will be particularly useful for what follows. 

The TAP equations are a mean-field approximation for the SK model derived 
by Thouless, Anderson and Palmer [Thouless 77]. Its starting point is the same 
Hamiltonian as Eq. (2.10) with J = 0: 



(2.19) 



i<j 



To be able to make a small-coupling expansion, we introduce an inverse temperature 
j3 in our Hamiltonian. We add also a Lagrange multiplier A(/3) fixing (<jj) = Wj 



rrii 



(2.20) 



i<j 



and the corresponding partition function is 



Z = exp 

M 



i<j 



(2.21) 



For f3 — 0, the Hamiltonian is trivial since it describes decoupled spins. In this 

case, 



and 



tanh Aj(0) = m 8 



log Z]^ = ^log[2cosh(A,(0))] -Xi(0)mi 



(2.22) 



El + TO* _ 1 + rrii 



El - TO; 1 - TTli 



(2.23) 
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For a general /3, we do a Taylor expansion around p — 



HP) = A,(0) + 



d/3 



+ 



d 2 \m 



13=0 



2 



(2.24) 



and 



F{P)= log Z = F(0) + 



dF(P) 



dp 



P + 



d 2 F(P) 



op 2 



P 2 



+ .... (2.25) 



0=0 



Each one of the derivatives of this series can be written as thermal averages of 
decoupled spins. For example 



dF(P) 



op 



Z ^ 

M 

x exp 



^2 Jaws + Yl 



d\{P) 



i<j 



dp 



(o-j - rrii) 



x 



y] Ai(0)(<7j - rrij) 



^ JijTTliUlj , 



i<j 



and 



9/3 



a 2 F(/3) 



13=0 



dpdirii 



= -5> 



tjTrij 



Continuing this expansion with respect to p up to the next order, we get 



(2.26) 



(2.27) 



„, a ,x v^l + m, /1 + mA ^ 1 - / 1 - m, : 



(2.28) 



i<j 



i<j 



and 



\{P = 1) = tanh 1 rrii — + m i — 

j (#) j (#) 



(2.29) 



Finally, to get back to our original Hamiltonian (2.19), we set A« = 0, obtaining: 



\ogZ = -^^ ) —\ g[——\-^——\og 



+ 1] ^m i m j + ^ ^ Jj(l - m 2 ){l - 



(2.30) 



i<j 
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and 

tanh -1 mi = Jij m j ~ m i ^ijO- ~~ m j) > (2-31) 

J (#) J (#) 

which are the original TAP equations. Note that the next terms of this expansion are 
on higher powers of Jij, that are defined in the SK model to be 0(N~ 1 ^ 2 ) and thus 
negligible in the N oo limit. Remark also that solving the A" coupled equations 
(2.31) is a hard problem in general, but feasible in the limit J ^> 1 or close to the 
spin-glass phase transition [Thouless 77]. 



2.4 Hopfield model 

In this section, we will discuss in more detail the Hopfield model that we have 
already presented in section 1.1.2, based on the work of Amit et al. [Amit 85a]. We 
will consider a more general Hamiltonian than the one presented previously, with 
the addition of local external fields: 



H=l ij i 

The corresponding partition function is 



r p 

/n 



dm 



> exp 

1 AT-l 



f3N 



n=i 



r p 

■/n 



dm, 



cxp \ — ^ m l + Yl log 



2 cosh [I3j2 m ^i +l 3h i 



(2.32) 



(2.33) 



(2.34) 



If the number of patterns p remains finite when A" — >■ oo, we can solve this integral 
using the saddle-point approximation 



logZ = — 7^}^ m » + l^ log 



2 cosh j/3 ^ m^ + phi 



, (2.35) 



where 



m - = AT 



(2.36) 
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The solutions of the equation (2.36) depend on the details of the patterns and on 
the external fields. For instance, let's consider the case where hi = and £f taken 
randomly according to a Bernoulli distribution P(£f = 1) = -P(£f — — 1) — 1/2- In 
this case, we can show that if /3 < 1, the only solution to the saddle-point equations 
is = 0, which corresponds to a paramagnetic phase, while for /3 > 1, non-trivial 
solutions of Eq. (2.36) do exist. The solutions that are global minima of the free- 
energy are the states where the magnetization over one pattern m M is non-zero while 
the others are zero, which correspond exactly to the thermodynamic states where 
one retrieves the /x-th pattern. 

Dealing with the case of p = aN, for a finite is considerably harder. It can be 
however treated through a calculation similar to that we will see in part III using 
the replica trick [Amit 92]. In this case, the system has a ferromagnetic phase, 
where it retrieves one of the patterns, a paramagnetic phase and a spin glass phase. 
In the solution of Amit et al., as in the Sherrington-Kirkpatrick model, one needs 
to make a replica-symmetric hypothesis to solve the saddle-point equations of the 
problem. In the case of the Hopfield model, for all but the very lowest temperatures 
the replica-symmetric solution yields the correct expression of the free energy. The 
phase diagram of the model is depicted in Fig. 2.3. 




Figure 2.3: Phase diagram of the Hopfield model [Amit 87] of p = aN patterns. The 
temperature T g corresponds to a transition from a paramagnetic phase to a spin-glass 
phase. For T < Tm the patterns are a local minima of the free-energy and for T < T c 
these minima are global. The temperature Tr is the one below which the replica-symmetric 
solution is false (see inset). 

2.5 Graphical models 

The Ising problem is a particular case of a class of problems known in the statistics 
community as undirected graphical models [Wainwright 08], which are statistical 
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models where the probability distribution can be factorized over the cliques of a 
certain graph G. 

A clique of a graph G is a subgraph C C G that is fully connected (see Fig 2.4). 
We pose C as the set of maximal cliques of a graph G, i.e., cliques that are not 
contained in any other clique. We say a probability distribution over N variables 
x\, . . . , xn is a graphical model if it can be factorized as 

P(Xi,...,X N ) = — Y\_ PC ({Xk}k vertex of c) , (2.37) 

where the underlying graph G has N vertices and Z is a normalizing constant of the 
probability. 




Figure 2.4: In this graph the maximal cliques are the two dark-blue colored subgraphs, 
each one of the 11 light blue colored triangles and all the edges that are not part of any 
of them [Eppstein 07] . 

In the case of the Ising model described in the beginning of this chapter, the 
underlying graph is the lattice and the maximal cliques are the edges that connect 
two neighbors. The probability of a configuration on the most general graphical 
model on a lattice is then 

(2.38) 

If we assume that our variables Xi can take binary values ±1, the function Pij(xi, Xj) 
is a function of {0,1} x {0,1} — > K, i. e., it can only take four different values: 
Pij(+1, +1), Pij(+1, —1), pij(—l, +1) and pij(—l, —1). There are an infinity of ways 
to express such a function with simple operations. We will choose the one that 
resembles the most with the Hamiltonian of a generalized Ising model: 

log(^j"(x£, J%j %i %j ~t~ hij %i ~t~ tlji %j ~t~ ^ij • (2.39) 

Indeed, we can easily solve the linear system to find the four unknown values (J^-, 
hij, hji and Kij) as a function of the four different values of Pij(xi, Xj). 



P(x u 



cxp 



<ij> 
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Absorbing all the constants Kij in the normalization Z and posing hi = ^ ■ hij, 
our probability is 



P(x 1 , ...,x N ) = ^exp 



(2.40) 



which correspond to the probability of a Ising-like model with couplings between 
closest neighbors and where both the local fields and the coupling between the 
neighbors are site-dependent 1 . 



2.5.1 Message-passing algorithms 

The expectation propagation is an interesting approximation for the general prob- 
lem of evaluating averages according to a graphical model. The starting point of 
this method is the fact that when the underlying graph is a tree, we can evaluate 
these averages exactly. Suppose that we want to evaluate 

P(x,)= £ P(x u ...,x N ). (2.41) 

{:ci,...,£ s _i,:r s -|-i,...,:rjv} 

We choose to represent our tree with s as its root. In this case, we can write 

P(X,)= ill PrA X r,X s ), (2.42) 

{xi,...,x s -i,x s+ i,...,x N } (r,s)eE(G) 

where E{G) is the set of edges of the graph G. We can decompose this expression 
on each branches starting on s. 




Figure 2.5: Example tree. Note the branches Tt,T u ,T v and T w starting on its root. 



For the tree shown in Fig. 2.5, for example, it will be 



1 Note that allowing non- uniform coupling between neighbors allows for systems with much more 
complex behaviors than just a simple ferromagnetic-paramagnetic transition. For an example, see 
the Edwards- Anderson model [Edwards 75]. 
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P(x s ) 



x 



x 



X 



^2 Ps,t( X s,Xt) \l Pr,d( X r,Xd) 
{xr} r ev(T t ) (r,d)eE(T t ) 

^ ^ Ps,u{Xsi Xqj) j Pr,d(%ryXd) 
{Xr}reV(T u ) (r,d)&E(T u ) 

^ ^ Ps,v{x s , X v ) | j Pr,d{x T i %d) 
{Xr}reV(T v ) (r,d)dE(T v ) 



^ ^ Ps,w(Xsi •Ew) j[ j Pr,d(Xri -Ed) 
{%r}rev(T w ) (r,d)eE(T w ) 



X 



X 



X 



(2.43) 



where each term in the product represents the contribution of one branch. As we 
can see, we transformed the Eq. (2.42) in four independent problems defined in each 
branch which can be solved separately. By repeating the procedure recursively, it 
is possible to solve the problem with a small number of operations. Note that the 
same divide-and-conquer method can be used also for evaluating Z. 

We now would like to reformulate this solution as an algorithm that would also 
be well defined in graphs with cycles, even if not to give an exact solution nor being 
guaranteed to converge. The algorithm work by passing in each iteration messages 
M st from every two vertex s and t connected by an edge, corresponding to an iterative 
relation 



M tr (x r )<-Kj2 Prt(x r ,x' t ) Yl M ut(x't) , (2-44) 

x' t u£N(t),u^=r 

where N(t) is the set of neighbors of t and k is a normalization constant fixing 

E Xr rLgJV(r) M tr(x r ) = 1. 

We can recover P(x s ) with the formula 



p(x a )=K n M ts { Xs ) 

teN(s) 



(2.45) 



Note that the fixed point of this algorithm is the solution of (2.43). This algorithm is 
the simplest message-passing algorithm and is known as belief propagation. Several 
variations of this algorithm can be found in the literature [Wainwright 08]. 
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Chapter 3 
Inverse Problems 



As exemplified in the previous chapters, most of the problems in statistical me- 
chanics consist of describing the collective behavior of a large number of interacting 
parts. In general, the individual behavior of parts and how they interact are either 
described by first principles or can be very accurately measured. Unfortunately, as 
we have seen in chapter 1, for a few problems like neuron networks, the behavior of 
the parts and/or how they interact is not known, even if we can measure their collec- 
tive behavior. In these cases, we would like to deduce the behavior and interactions 
of the parts from the available data. We talk then of inverse problems. 

Inverse problems are often ill posed, i.e., there is more than one possible set 
of laws or parameters that can describe the observed data. To give an example, 
suppose all we know about a real-valued random variable x is that (x) = and 
(x 2 ) = 1. Even if we restrain ourselves to Bernoulli distributions, there is an infinity 
of distributions satisfying our conditions: for any real a, P(x = a) = 1/(1 + a 2 ) and 
P(x = — 1/a) = 1 — 1/(1 + a 2 ) meet our requirements. 

Intuitively, a possible criterion for choosing one among all these distributions is to 
look for the least "restrictive" one, i. e., the one which allows as many different values 
as possible. To formalize this criterion, we need to define the Shannon entropy of a 
statistical distribution. We will start thus this chapter by defining this entropy and 
presenting how to optimize it to put inverse problems in a well-defined framework. 
In the sequence, we will present the Bayesian inference, which is a complementary 
approach to the entropy optimization. Finally, we will define and present some 
known results for the inverse problem of most interest for this work: the inverse 
Ising problem. 



The Shannon entropy of a random variable is a measure of the quantity of infor- 
mation unknown about it. It is defined by the sum 



where -P(fi) is the probability of the configuration Q of the system. Its interpretation 
as the quantity of information comes from the Shannon's source coding theorem, 



3.1 Maximal entropy distribution 




(3.1) 



n 
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which states that the best theoretically possible compression algorithm can encode 
a sampling of N values taken with the distribution P using NS bits in the iV — > oo 
limit. 

If we are looking for the most general distribution that reproduces a set of av- 
erages f\ = (fi(X)), it is reasonable to look for the one that maximizes S. This 
is known as the principle of maximum entropy. It can be interpreted as the model 
that satisfies our constraints, i. e., reproducing the prescribed set of averages, while 
imposing as few extra conditions as possible. 

Let us now consider the interesting case of random binary variables <7j = ±1, 
constraint to satisfy a set of local averages (<7j) = rrii and correlations (aiaf) = Cij 
[Tkacik 06]. In principle, one could also consider imposing higher order correlations, 
like the three-site ones = (<7i<7j<7fc), but doing so would only be useful in situa- 
tions where one knows such high-order couplings precisely. Unfortunately, to extract 
such data from an experimental system one needs to measure a very large number of 
configurations of the system, which is rarely possible. We choose then to deal with 
only one and two-site correlations. In this case, we define generically the probability 
P({ a i}) = Pai,...,a N °f a configuration {a±, . . . , a at} and we can write the entropy as 

S = ~^2Pa u ...,a N l0gp CT1 ,..., CTjv • (3.2) 
M 

In order to impose the constraints on the averages and correlations, we add the 
Lagrange multipliers hi, Jij and A respectively associated to rrii = (<7j), Cy = (<7j<7j) 
and to the normalization of the probability Y1,{c}P^,-^n = 1- We obtain 

S = ~ ^2P*U...,* N l0gp ai ,...,a N + ^hi I TTli - ^2p ai ,...,a N 0-i 
M i \ M 



(3.3) 



W} J \ M 



Optimizing S on p ai ,...,a N we obtain 

= A - 1 - logp CTl ,..., CTjv + Y h i a i + $Z J ii a i a i ■ ( 3 - 4 ) 

i ij 

Solving Eq. (3.4), we derive the probability distribution 

P({<7» = e^ 1 exp I J ij a i a j + Yl hi(Ti ) ' ( 3 - 5 ) 

\ ij i / 

which corresponds exactly to the Boltzmann distribution for the generalized Ising 
model we presented in the beginning of chapter 2 (Eqs. (2.2) and (2.9)) for A — 1 = 
- log Z. 

At this point, we know how the probability distribution depends on and on 
hi. To solve completely this problem, we need to express and hi in terms of the 
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imposed averages and correlations. Since Jij and hi are Lagrange multipliers, the 
values they should take to reproduce our averages and correlations correspond to an 
extrema of the entropy. To know whether it is a maximum or a minimum, we can 
do an explicit calculation of jj- and J^- to see that the entropy is a convex function 
of the parameters and hi. Thus, we need to look for the set of parameters that 
minimizes the entropy. Moreover, the convexity assures that if a minimum exists, 
it is unique. 

To illustrate, let's examine the case of a two-spin system with (<7i) = m 1; (<7 2 ) = 
m 2 and (a 1 a 2 ) = C. The entropy is given by 

S = log < ^2 ^2 exp t^ "!^ + h^i + h 2 o- 2 ] > - JC - h 1 m 1 - h 2 m 2 , 



= log |g^ + ' ll+ ' 12 -|- gJ- h i-h2 _|_ g-J+hi-h.2 _|_ g-J-fci+fe2| 

— JC — himi — h 2 m 2 . 



(3.6) 



Since we have only two spins, the optimization of S with respect to hi, h 2 and J 
can be done explicitly and we obtain 



J 

h 
h 2 



log 
log 
log 



(1 + C - mi - m 2 )(l + C + mi + m 2 ) 
(1 — C — mi + m 2 )(l — C + mi — m 2 ) 
(1 - C + mi - m 2 )(l + C + mi + m 2 ) 
(1 + C - mi - m 2 )(l - C - mi + m 2 ) 
(1 - C — mi + m 2 )(l + C + mi + m 2 ) 
(1 + C - mi - m 2 )(l - C + mi - m 2 ) 



(3.7) 
(3.8) 
(3.9) 



3.2 Bayesian inference 

Suppose now that we are not only interested in finding the best parameters to 
fit some data, but also in attributing a probability distribution to the set of these 
possible parameters. If our model depends on a set of unknown parameters {Aj} 
we could in principle write the probability PdA^jKAj}) of measuring any set of 
configurations {Xj}. The Bayes theorem states that the probability of a set of 
parameters as a function of a set of measures {Xi} is 



P({K}\{Xi}) 



P({^}|{A,})P ({A t }) 
P({Xi}) 



(3.10) 



where Po({A«}) is the a priori probability of the parameters {Aj} and P({X,}) is 
the marginal probability of {Xi}, which can also be interpreted as a normalization 
const ctiiij * 

P({Xi}) = E P (™l^) P °« A *» • ( 3 - U ) 

{Ad 

If one is looking to the set of parameters {Aj} that best describes the measures 
{Xj}, a natural choice is the one that maximizes Eq. (3.10). Such choice is known 
as the maximum a posteriori (MAP) estimator. 
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On the other hand, when the prior is not known, a common procedure is to look 
for the set of {Aj} that maximizes P({Aj} |{A;}), which is the same as setting the 
prior to P ({Aj}) = 1. We call such procedure the maximum likelihood estimation. 

To illustrate, suppose that we have a coin that we know is biased in the following 
way: P(favored side) = 1/2 + e and P(unfavored side) = 1/2 — e, but we do not 
know if head or tails is favored. We toss this coin three times and get three heads. 
Using the Bayes theorem, we have 

P(tails is favored|3 heads) = P < 3 heads|tails^favo^P (tails favored) ^ 

Since we have no prior knowledge whether head or tails is favored, we have P (tails favored) 
P (heads favored) = 1/2, which leads to 



(I -«)' 
(!-<) 3 + (^)' 



P(tails are favored|3 heads) = — 1 3 , (3.13) 



and 



(§- C ) 3 +(§ + C )< 



P(heads are favored|3 heads) = — ^ . (3-14) 



Unsurprisingly, we conclude that it is more likely that the coin's favored side is 
heads. 

Let us now look at a slightly different situation where the bias e is unknown. 
As before, there is an unknown favored side and heads are obtained three times. 
We would like to determine e. From Eq. (3.10), we obtain an expression similar to 
Eq. (3.12) 

Remark in the denominator the normalization according to Eq. (3.11). In this 
case the probability has a strong dependence on the prior, which is unknown in 
the majority of inference problems. Fortunately, this problem gets less and less 
important when one increases the amount of data. For example, suppose that instead 
of doing just three coin tosses, we toss it a large number of times, getting A" heads 
and M tails. In this case, Eq. (3.15) becomes 

( 1 } N ( 1 } M P ( ) 

P(e\N heads, M tails) = ^ ^ — ^ w m , ( 3 - 16 ) 

f-[%(l + e>) N (l-e>) M Po(e>)de>' 

where the binomial term ( ^ ~t ) gets canceled out with the normalization. 



N 



Since the function (| + e) (| — e) has a very sharp peak around (N — M)/(2M + 
2N), we can do the following approximation: 

/1 \ N fi \ M _,. n \ N n V'/N-M 
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If we apply this approximation to Eq. (3.15), the value Pq ( 2 m+^n ) a PP ears both in 
the numerator and in the denominator and will cancel out. The probability is thus 
independent of the unknown function -Po( e )- 

In other situations, the prior might be useful to make an inference procedure more 
robust. Suppose for example that we are measuring a system composed by a large 
number of spins By pure coincidence (or lack of data), two particular sites, % 

and j, have identical spin values in all the measured configurations. Without defining 
a prior (i.e, setting Po(a) = 1), the algorithm will infer an infinite- valued coupling 
between the two sites to account for this, which is non-physical and numerically 
problematic. On the other hand, if we suppose that Po(a) is a Gaussian distribution, 
the prior will skew the inferred values away from very large values, avoiding the 
problematic solutions. 

3.2.1 Relationship with entropy maximization 

Suppose that we make L independent measurements {a{\ of a system we would 
like to describe using a set of parameters a. Since the measures are independent, 
we can write 

L 

logP({<7}|a) = 5]logP(<7,|a). (3.18) 
i=i 

Using the maximum a posteriori principle and the Bayes theorem, the set of a that 
best describes the data is 



a = argmax 



\ogP (a') + J2^gP((Ti\a') 



i=i 



(3.19) 



where Po(a) is the prior probability of a. If we want to use the principle of the 
maximization of the entropy, one should estimate the entropy from the data as 



1 L 

%) = - z ^logP( ff; |«) ~- (log P(a\a)) a , (3.20) 



using the definition of an average, we can show that S(a) corresponds to the usual 
definition of the entropy 

S(a) ~ -^P(<7|a)logP(<7|a). (3.21) 

M 

As we saw in the last section, we should then minimize S(a) with respect to the 
parameters a, what corresponds exactly to maximizing P({a}\a), as one would do 
using the maximum likelihood method. 

3.3 The inverse Ising problem: some results from 
the literature 

We call the problem of finding the set of couplings {Jij} and local fields {hi} 
from the set of magnetizations {mj} and correlations {CV,} of a generalized Ising 
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model the inverse generalized Ising problem [Schneidman 06]. In the following, we 
will omit the mention "generalized" for simplicity. We expect this problem to be 
particularly hard, since as we have seen in chapter 2, the direct problem of finding 
the magnetizations from the model's parameters is already a non-trivial one. 

3.3.1 Monte Carlo optimization 

One can use the fact that the direct problem is numerically solvable with Monte 
Carlo (MC) methods [Krauth 06] to solve the inverse problem with the following 
algorithm [Ackley 85]: 

1. start with an initial guess for the parameters J?- and h®. 

2. do a Monte-Carlo simulation to find the set of magnetization mf 1 and corre- 
lations Cf^ corresponding to these parameters. 

3. update J tj according to JJ+ 1 = J\- + r)(t)(C i:j - Cff) + a( J?. - J*" 1 ) for some 
chosen function r] and constant a. 

4. update hi analogously to J^-. 

5. repeat steps 2-4 until max^CV,- — C^f) < e. 

The number of steps necessary to reach a certain accuracy depends both on the 
initial parameters, the function 77 and the parameter a. An important drawback of 
this algorithm is that it is very inefficient: at each step one must do a Monte-Carlo 
simulation that is very time-consuming if one needs an accurate result. There are 
others modified versions of this algorithm that improve the number of necessary 
steps [Broderick 07], but they all involve doing a MC simulation at every step and 
thus have the same drawbacks. 

3.3.2 Susceptibility propagation 

In 2008, M. Mezard and T. Mora had the interesting idea of modifying the 
message passing algorithm we have seen in section 2.5.1 to solve the inverse Ising 
problem [Mezard 08,Marinari 10]. 

In their paper, the authors first write the Belief Propagation equations to find 
the values of Cij and mj. They reinterpret these equations by identifying {Jy,hi} 
as the unknowns and {Cjj,mj} as the input data and describe a message passing 
procedure that converges to the right fixed-point in trees. The details can be found 
in [Marinari 10]. 

This procedure, in the same way as the belief propagation for the direct problem, 
is exact on trees and an approximation for graphs that contain loops. If it converges 
(which is not guaranteed in graphs with loops), it do so in polynomial time, which 
makes it much faster than the Monte Carlo optimization. The main drawback of 
this method is that for graphs with loops the resulting approximated solution might 
be very far from the optimal solution of the problem. 
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3.3.3 Inversion of TAP equations 

Another approach for solving the inverse Ising problem was proposed by Roudi 
et al. [Roudi 09]. Their starting point are the TAP equations we already saw in 
section 2.3: 

tanh -1 rrii = hi + Jij m j ~ m i ^2 ^fj(^ ~ m j) ■ (3.22) 

3 (#) 3 (#) 

Taking the derivative of this expression with respect to m,j and noting that (C^ 1 )^ = 
dhi/drrij, we have 

(C-% = -J tj - 2Jf j m i m j + (1 - m^)- 1 ^ , (3.23) 

which is easily solvable for Jy. 

Note that if is small, we can neglect the J?- term, yielding an explicit solution 
for the couplings 

The inversion of Eq. (3.23) has the same strengths and drawbacks of the use of 
the TAP equations in the direct problem: it is exact in the large size limit for the 
SK model and we might expect it to work well only in models where the couplings 
are small. 



3.3.4 Auto-consistent equations 

Recently, a novel approach for finding the parameters of an Ising model was 
proposed by the statistics community [Wainwright 10]. Its main idea reposes on the 
fact that for a Ising system, the magnetization respects 



rrii 



J2 ^ ex P 



j<k 

j^i^k 



tanh J JijCj + K 
d (#) 



(3.25) 



where we used the fact that J2 a=±l <?e aA = tanh(A) J2 a=±l e aA . An analogous 
expression can be derived for the correlations: 



a 



Ajj{v, +, +) ~ AjjOj +, -) - Ajjcr, ~, +) + Ajjcr, -, -) 
Aij(a, +, +) + Aij{a, +, -) + A^a, -, +) + A^a, -,-)/ ' 



(3.26) 



where 



A i:j (a,T,p) = exp 



prJij + p ^2 Jik^k + r ^2 -JjkO-k + rhi + ph,j 



(3.27) 
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Suppose now that we have a set of L independent measures of the full spin 
configuration of our system {a 1 , <j l }, with a k = {of, ...a 1 ^}. We can then estimate 
Eqs. (3.25) and (3.26): 

i E 4 = i E tanh f E J a°i + h * J ' ( 3 - 28 ) 

1=1 '=1 V"(#0 / 

and, respectively, 

zX>i = 

(3.29) 

_ 1 ^ A ij (<r',+,+) - A ij -(tr',+,-) -^V,-,+) + ^V,-,-) 
" " L f^Ajia 1 , +, +) + ^-((T 1 , +, -) + A i3 {a\ -, +) + ^-(^, -, -) ' 

We have thus a system of coupled non-linear equations for and hi which can be 
solved without the need to evaluate the partition function Z . 

This procedure allows one to find the couplings from the measured data in poly- 
nomial time, but it has a few drawbacks. First of all, this procedure is not optimal 
according to the Bayes theorem. It depends on all high-order correlations while the 
optimal Bayes inference depends only on magnetizations and correlations. Accord- 
ingly, this method does not work if the Hamiltonian used to generate the data has 
any three or higher order couplings. This is particularly awkward for the case of 
inferring neural synapses where the hypothesis of the Ising model is just an approx- 
imation. Finally, solving the set of equations for is a non-trivial problem. The 
original paper [Wainwright 10] proposes an algorithm to solve it that unfortunately 
does not work in the low-temperature regime. 
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Some results on the inverse Ising 

problem 
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Chapter 4 



The inverse Ising problem in the 
small-correlation limit 



In section 3.3, we have introduced the inverse Ising problem and discussed what 
has been done in the literature to solve it. In this chapter, we propose a small- 
correlation expansion procedure that allows one to find the couplings and mag- 
netizations up to any given power on the correlations [Sessak 09]. We will find 
an explicit expression for the couplings and magnetizations that is correct up to 
0((largest connected correlation) 3 ). 

We consider the generalized Ising model for a system composed of N spins a { = 
±1, i — 1, . . . , N, whose Hamiltonian is given by 

H ({ a i}) = -^JijViVj - ^hiCTi , (4.1) 

i<j i 

as we have already introduced in chap. 2, in Eq. (2.9). We want to find the values 
of couplings J*j and fields h* such that the average values of the spins and of the 
spin-spin correlations match the prescribed magnetizations rrii, given by 

m t = ((n) , (4.2) 

and connected correlations Cy, defined by 

c i:j = {(TiGj) - rrii rrij . (4.3) 

For given fixed magnetizations and correlations, the entropy of the generalized Ising 
model, obtained in section 3.1, is given by 

S ({ J ij}, {K}', = ^og Z({J i:j }, {hi}) - ^Jij (Cij + mirrij) - J^^mj, 

i<j i 

= log ^ eX P ) ^2 J{ j [°" i0 J' ~ Ci 3 ~ m i m i\ + hi ( a * ~ mi )\ ' 

Wi} I i<3 i ) 

= log ^2 exp ) S Ji i ^ ~ m *)(°i ~ m j) ~ c d + Xi ( a 



(4.4) 

rrii" 
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where the new fields Aj are simply related to the physical fields hi through Aj = 
hi + Jijirij. In this same section, we have seen that the couplings J*j and fields 
h* are the ones that minimize the entropy. As discussed in chap. 2, the exact 
evaluation of the entropy shown in Eq. (4.4) for a given set of and Aj is, in 
general, a computationally challenging task, not to say about its minimization. To 
obtain a tractable expression we multiply all connected correlations Cy in Eq. (4.4) 
by the same small parameter (3, which can be interpreted as a fictitious inverse 
temperature. Our entropy is thus 

s ({ J ij}, {M; {P Cij}) = 

= log ^ ex P \ ^2 Ji i [( ai ~ mi )(°i _ m i) ~ P Ci i\ + ^( ai ~ m i) \ ' ( 4 - 5 ) 

{cr t } I i<3 i ) 

In this chapter we want to expand the entropy in powers of (3 as a function of the 
magnetizations and correlations: 

S({ mi }, {(3 Cij }) = S° + (3S 1 + (3 2 S 2 + . . . , (4.6) 

Accordingly, J*j and A* can also be written as series on (3: 

J*.({m,},{/3 % }) = •/;.; • I.J?, • l 2 4 ■ (4.7) 
A*(W,{/3 Qj }) = A° + /3A J 1 + /3 2 A J 2 + ... , (4.8) 

where we omit the dependency of the terms on {mi} and {%} to make notations 
lighter. The entropy we are looking for will be obtained when setting /3 — 1 in 
the expansion. Since the parameter (3 multiply every value of c^-, we have that 
S k = 0(d?j). We can thus deduce that our expansion for S will be convergent for 
small enough couplings. Note that once we have expressed the entropy as a series on 
(3, we can retrieve an expansion for couplings and fields using the following identities, 
that follow from the definition of the entropy: 

as( {m ,}, {p c,,}) = (4 9) 

and 

^({m},{/3 Qj }) = 
orrii 

Thus, once we have found an expansion for S, it is trivial to deduce from it an 
expansion for J*j and A*. 

The calculation of the entropy S({rrii}, { /3 c^- } ) is straightforward for (3 = since 
spins are uncoupled in this limit. In this case, the values of the couplings and fields 
minimizing the entropy are thus 

J°=0 and A° = tanh" 1 ^) . (4.11) 

Accordingly, the entropy for (3 = is 



1 + TOj 1 + TOj 1 — rrii 1 — rrii 
In 1 In 



(4.12) 



36 



CHAPTER 4. THE INVERSE ISING PROBLEM IN THE SMALL-CORRELATION 

LIMIT 



To find the non-trivial terms of the entropy we proceed in the following way: 
first we define a potential U over the spin configurations at inverse temperature (3 
through (note the new last term) 



({<*» = E [fa - m *)fa - m ^ - p ^ + E A *(^)fa - 

i<j i 

E% f d/3'J*.(/3'), 
Jo 



(4.13) 



+ 

i<j 



and a modified entropy (compare to Eq. (4.4)) 

S({m t },{c tJ },P) =logEe l7({CTl}) • (4.14) 

W 

Notice that C/ depends on the coupling values J*j((3') at all inverse temperatures 
P' < (3. The true entropy (at its minimum) and the modified entropy are simply 
related to each other through 

5({m,},{Q,},/3) = 5({m,},{ Qj },/3)-E^ / ^ J W) ■ ( 4 - 15 ) 

The modified entropy S in Eq. (4.14) was chosen to be independent of f3. Indeed, 
it has an explicit dependence on (5 through the potential U (Eq. (4.13)), and an 
implicit dependence through the couplings and the fields. As the latter are chosen 
to minimize S, the full derivative of S with respect to (3 coincides with its partial 
derivative, and we get 

% = % = ~ £ <* J ^ + E <% -m = • ( 4 - 16 ) 

i<j i<j 

The above equality is true for any (3. Consequently, S is constant and equal to its 
value at (3 = 0, S°, given in Eq. (4.12). 

In the following, we will use the fact that S does not depend on (3 to write self- 
consistency equations from which we will deduce our expansion. We will start by 
presenting S l and S 2 since their calculations differ from those used for higher orders. 
Afterwards, we will present the calculations for S 3 as a generalizable example of the 
general method, which will be presented in the sequence. 

4.1 Evaluation of S 1 and S 2 

To find S 1 , we derive Eq. (4.15) with respect to (3: 



b ~ dj3 



OS 

d/3 



-E^-(O), 

i<3 

0, (4.17) 
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since S does not depend on p (Eq. (4.16)) and J* (0) = J°- = (Eq. (4.11)). A 



direct consequence of this, deriving from Eq. (4.10), is 

OK 

d/3 



= A 1 . 



(4.18) 



To evaluate the next term S 2 , we note that since = for any /3, we have 

in particular that 



E{(j} e ^(wjj) 9/3 " \ d p 

Evaluating Eq. (4.19) explicitly for ft = yields 



dS 
dp 



(4.19) 



i<j 



i<j 



dp 



(((7j — mi)((jj — rrij)) , 



(4.20) 



This equality is trivial, since we know that «/* (0) = and the averages also vanish 
since the spins are uncoupled for p = 0. We must thus look at the second derivative 
of S: 

'dUV 



op 2 \ op 2 

Explicitly, the first term corresponds to 



dU\ 
dp) 



dp I ' 



(4.21) 



d 2 U 
dp 2 



d 2 J* 



d 2 \* 



= Yl fa - m *) fa - m j) + J2 ~dpt fa ~ m i) 



^ dp 



-E 

i<j 



dp 



d 2 T* 

i<j 



which for P = yields 



d 2 U 



dp 2 



i<j 



dp 



(4.22) 



(4.23) 



The next one is given by 



(d_U_ 



^ ^ ~^-^t fa - m i) fa - m j) fa - rn k ) (ai - mi) 



i<j k<l 



dp dp 



^A* dX *J / 

+ 2^ ^"^"fa ~ mi )fa - m i) 



9/3 dp 



i<j k 

which for p = reduces to 

dU 

~d/3 



dp dp 



(4.24) 



E 



(dJ\ 



- 



XI 



i \ dp 



(1 ""»?)(! ""»?), 



(4.25) 
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where we used in the last equation the fact that 9/3 

last term vanishes as consequence of Eq. (4.19). 
Finally, we can rewrite Eq. (4.21) as 



A- = (Eq. (4.18)). The 



f) T* 

ij d(3 



E 



- 



'J 



, . ' V dp 



(1 ""»?)(! 



(4.26) 



whose simpler solution is given by 



d(3 



(1 - mf)(l - rri]) 



= Jh ■ 



(4.27) 



Using Eqs. (4.15) and (4.10) we can now deduce 



_ 1 y" c ^ 

2^(l-m?)(l- 



and 



<9 2 A* 



<9/3 2 



mf)(l - m 2 ) 2 



= 2A 2 



(4.28) 



(4.29) 



Finally, we have the value of A 2 , S and our first non-trivial estimation of J*j. We 
can verify the correctness of Eq. (4.27) by noting it is the first-order approximation 
of Eq. (3.7) for small c. 

4.2 Evaluation of S 3 

Like in previous section, we calculate the third derivative of S with respect to (3: 
=S = /SWSW(Sl >• («0) 



<9/3 3 \ df3 3 I ' " \ <9/3 2 9/3 / ' \ V -9/3 
which yields, after evaluating the averages (see appendix A): 

(1 -m 2 ) 2 (l-m 2 ) 2 



f)2 j* 



<9/3 2 



i<jr v »/ \ J ' i<j<k 

Taking the third derivative of Eq. (4.15), we can show that 

d 3 S 



(l-m?)(l-mj)(l-i»J) 



E' 



Q2 j* 



dp 



d(3 3 



6S 6 . 



(4.31) 
(4.32) 



Comparing the two last equations, we finally find the expression for S 3 : 



Ettt 



C-jTTlimj 



t<3 



m 2 ) 2 (l — m 2 ) 2 



i<j<k v 



-m 2 )(l -m 2 )(l -m 2 ) 



(4.33) 
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4.3 Higher orders 



The expansion procedure can be continued order by order using the same pro- 
cedure as in section 4.2. To evaluate S k having already evaluated all S n for n < k, 

one must evaluate Cj- as a sum of averages with respect to uncoupled spins. After 

p o 

evaluating explicitly the averages, one will have 



d k S 



ak-1 j* 



13 d^- 1 



i<j 



+ Qk , 



(4.34) 



where Q k is a (known) function of the magnetizations, correlations, and of the deriva- 
tives in (3 = of the couplings </*• and fields A* of order < max(l,A; — 2). See 
Appendices A and B. 

Finally, as S is constant by virtue of Eq. (4.16), both sides of Eq. (4.34) vanish. 
Using Eq. (4.15), we have 

Qk 



S k = - 



k\ ' 



(4.35) 



which allows then to find the fields and couplings using Eqs. (4.9-4.10). 

Using this procedure, we could go up to S* 4 (details are on Appendix A). Using 
the notations 



Li = (((7j — mj) ) fl — 1 — ml, (4.36) 
which is basically the variance of an independent spin of average rrii and 



K 



:i-s. 



U ) 



(fa -rrn) 2 )^^ -m j ) 2 ) Q 



'1-5 



L{Lj 



(4.37) 



where we have multiplied our definition of by one minus a Kronecker symbol so 
that Ka = 0, what makes our notations simpler. With these definitions, we have 



s = -E 



1 + rrii 1 + mi 1 — m ; 1 — 
m 1 In 



(3 2 2 
-y K ij L i L j + E K l-m i mjL, i L j + /3 3 K ij K ik K hi L i L, J L k 

i<j i<j i<j<k 

i<j i<j k 

-/3 4 ^ {KijKjkKkiKu + K ik K kj KijKii + KijKjiKi k K ki )LiLjL k Li 

i<j<k<l 

+0(/3 5 ) . (4.38) 
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The result for J*- is 



+ \(3 Z Kl [l + 3m 2 + 3m? + 9™^] 

+(3 3 K^KuKuLkU + OiP 4 ), (4.39) 



and the physical field is given by 

_^3 (1 + 3m 2 } Kf.m,-^ - 2/3 3 m z ]T K l3 K jk K kl L,L k 
+2/3 4 m ; KikKkjKjiKuLiLjLk 

i<j k 

+fi 4 mi J2 K tj L 3 [l + mf + 3m 2 + 3m, 2 77$ 
i 

+/3 4 m / £ £ KfjK^LiL^ + 0(/3 5 ) . (4.40) 



4.4 Checking the correctness of the expansion 

As we can see in Appendix A, the calculations for getting to Eq. (4.38) are long 
and error-prone. In this section, we will look at the different methods used to verify 
the correctness of these calculations. 



4.4.1 Comparing the values of the external field with TAP 
equations 

In section 2.3, we presented an expansion of the free energy of the direct Ising 
model for small couplings. The first two orders were developed by Thouless et al. 
to solve the SK model, and are given in Eq. (2.30). In 1991 A. Georges and J. 
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Yedidia [Georges 91] published the next two orders of this expansion. They found 



ajrffar \ { \\ ST 1+m M 1 + . l-rrii ^ 



i<j 



(3 2 2(3 3 
+ JijLiLj H — jfjinirajLiLj + /3 3 JijJjkJkiLiLjLk 

i<j i<j i<j<k 

- ^E4( 1 + 3m ' + 3m ?- 15m H 2 ) L ^' 

+ 2[3 4 YY J ir J i kJkimim i LiL i Lk 

i<j k 

+ /3 4 (JijJjkJklJli + JikJkjJljJil + Jij J jlJlkJki) LiLjLkLl ■ 

i<j<k<l 

(4.41) 

From this result, we can derive the external fields as a function of and mj 
through: 

h t ({J t3 },{ mi }) = —. (4.42) 
For example, up to J 2 , we have 

fci = \ ^ ([^) - E J a m i + E 4™^- + °( j3 ) ■ ( 4 - 43 ) 

3 3 j 

We would like to compare this equation to our result for /ij ({%■}, {mj}), given in 
Eq. (4.40), in order to check the correctness of our expansion. To rewrite Eq. (4.43) 
as a function of {%}, we use the expansion for Jij({c}, {m}) obtained by us, J^ = 
Kij + 0(c 2 ). We rewrite then Eq. (4.43) as 

hi = \ In (j^j ~ E J v m > + E X 5™^ + °( c3 ) > ( 444 ) 

which corresponds exactly to the first three terms of Eq. (4.40). We followed the 
same procedure using all the terms of Eq. (4.41) and the expansion of J i3 - . We could 
verify then all the orders of Eq. (4.40). The details are in Appendix C. 



4.4.2 Numerical minimum-squares fit 

In this section, we present a method to verify our expansion for S given in 
Eq. (4.38) numerically. For that, we rewrite our result in a slightly more general 
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way, introducing the coefficients {a\, . . . , a^}: 



1 + rrii 1 + rrii 1 — rrii 1 — rrii 
In — : 1 : — m 



2 2 2 2 

+ ai ^2 + °2 ^2 Kij m i m jLiLj + a 3 ^ KijKj k K ki LiLjL k 

i<j i<j i<j<k 

+ a 4 J2 K i [1 + + 3m 2 + 9m 2 m 2 ] L t L 3 + a 5 J^J2 K l K h L l L i L i 

i<j i<j k 

+ a 6 ^ {KijK jk K k iKii + K ik K k jKijKii + KijKjiKi k K ki )LiLjL k Li . 

i<j<k<l 

(4.45) 

We would like to obtain these coefficients through a numerical fit using data gener- 
ated by exact enumeration. Afterwards, we can verify if these results match those 
derived formerly in this chapter. 

We proceeded in the following way: 

1. We choose randomly {cy} and {rrii} for a system with = 5 spins. Both the 
correlations and the magnetizations {mj are chosen randomly with an 
uniform distribution in the interval [ — 10~ 12 , 10~ 12 ] and [—1,1], respectively. 
The values of are very small so that the terms on c k+1 in the expansion of 
S are negligeable with respect to those in c k . 

2. We find numerically the minimum S num of the entropy S with respect to 
and hi. This calculation has to be done with a very large numerical precision 
to account for the very small values of Cy. We used 400 decimal units. 

3. We repeat steps 1 and 2 for different samplings of and {rrii} to evaluate 
D = ((S dcv — S num ) 2 ) r , , , . In our case, we used 60 different random values 

W / l{cij},{mi) ' 

of {cij} and {rrii}. 

4. We find the set of a = {ai, . . . ,a^} that minimizes D. Note that since D 
is a quadratic function of the coefficients a«, this method can still be done 
efficiently if we go further in the expansion and have a much larger set a. 

The obtained values of {ai, . . . , a 6 } (see table 4.1) show a very good agreement 
with Eq. (4.38), giving support to our derivation. 



Constant 


ai 


a 2 


a 3 


Gt4 


a 5 


a 6 


Error 


3.7- 10- 32 


5.4 • 10- 20 


2.1 • HT 18 


8.2- 10~ 12 


6.7- 10- 8 


3.8- KT 6 



Table 4.1: Agreement between theoretical and numerical values of {ai, . . . , ae} 
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Chapter 5 



Further results based on our 
expansion for the inverse Ising 
model 



In this chapter, we will see some useful results that follow from the expansion 
made in the last chapter. In particular, we will sum some infinite subsets of the 
expansion, what will make the expansion more robust. 

To make some results in the following more visual, we will introduce a diagram- 
matical notation. A point in a diagram represents a spin and a line represents a 
(3Kij link. We do not represent the polynomial in the variables {rrii} that multiplies 
each link. Summation over the indices is implicit. Using these conventions, we can 
write our entropy as: 




We can also represent </*• diagrammatically, with the difference that we connect 
the % and j sites with a dashed line that do not represent any term in the expansion. 
The summation over indices are only done in sites that are not connected by a dashed 
line. We obtain 
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5.1 Loop summation 



If we rewrite Eq. (4.38) in a slightly different form, a particular subset of the terms 
in the expansion seems to follow a regular pattern (mind the last three terms): 



« = -E 



1 + mi . 1 + rrii 1 - m, n 1 — m, 
In 1 In 



2 ^ 



i<j 



i<j 



^ J KfjLiLj + (3 3 ^2 KijKjkKkiLiLjLk 



i<j 



i<j<k 



y~] KijKjkKkiKuLiLjLkLi 



i,j,k,l 
?5\ 



where we have used the identity 

P 4 



(5.3) 



— /3 4 £ {KijKj k K k iKu + K ik K k jKijKu + KijKjiKi k K ki )LiLjL k Li . 

i<j<k<l 



P 4 



(5.4) 



The last three terms of Eq. (5.3) can be written in a different form: 



/3 2 



■S' 1 "" 1 ' - 'j £ KijLiLj + — £ l\jjl\ jk l\ k ;I.;I.jI. k 



h3 



P 4 



/3 2 



^ KijKj k K k iKnLiLjL k Li , 

i,j,k,l 







P 4 



Tr(M ) + — Tr(AT) - — Tr(M ) , 



6 



8 



(5.5) 



where M is the matrix defined by My = Kij^yT^Lj and we will justify in the 
following the notation S loop . Since Ku = 0, we have TrM = 0, which implies that 
Eq. (5.5) can be rewritten as 



Cj<loop 



-TrfpM- —M 2 + —M 3 - — M 4 
2 I 2 3 4 



(5.6) 



We now make the hypothesis that if we continue this expansion to higher orders on 
(3 we will found all the other terms on (—M) k /k. Thus, 



gloop 



-TrfpM- —M 2 + ^M 3 - — M 4 + 
2 V 2 3 4 

Tr [log(l + pM)\ = log [det(l + /3M)\ . 



(5.7) 
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In diagrammatic terms, Eq. (5.7) corresponds to summing all single- loop diagrams 
and their possible contractions (see Eq. (5.4) for example): 



gloop 




1 

2 




+ 




+ 



(5.8) 



From Eqs. (5.7) and (4.9), we can also derive a formula for the contribution j]° op 



of ,S loop to J*,: 



J 



loop 



1 -m?)(l -m 2 ) 



(5.9) 



which is exactly the formula for the mean-field approximation found in Eq. (3.24). 
This is not only a very good evidence that the hypothesis we used on Eq. (5.7) is 
correct but also gives a physical interpretation to S loop . Finally, we can combine this 
result with the previous ones, yielding: 



S = ,S loop -^ 



1 + m, 1 + to, 1-mj. 1 - 
In 1 In 



i<3 



(5.10) 



i<3 



and 

j. = J^ P _ 2/3 2 mim . K 2 

-\p z Kl [I - 3m, 2 - 3m 2 - 3m 2 m 2 ] + 0(/3 4 ) . (5.11) 

Note that the infinite series shown in Eq. (5.7) is divergent when one of the 
eigenvectors of M is greater than one, while Eqs. (5.7) and (5.9) remain stable for 
all positive eigenvalues of M. In practical terms, the loop summation is much more 
robust for inferring the couplings than the simple power expansion in Eq. (4.38). In 
the next section, we propose a simple numerical verification of our hypothesis that 
confirm this assertion. 



Numerical verification of our series expansion and the loop sum 

We have tested the behavior of the series on the Sherrington-Kirkpatrick model 
in the paramagnetic phase. We randomly drew a set of x (A^ — 1)/2 couplings Jf- ue 
from uncorrelated normal distributions of variance J 2 /N. From Monte-Carlo sim- 
ulations, we calculated the correlations and magnetizations, inferring the couplings 
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J*j from Eqs. (4.39) and (5.11) and compared the outcome to the true couplings 



through the estimator 



N(N-1)J 2 



E(4-4T) ; 



(5.12) 



i<j 



The quality of inference can be seen in Figure 5.1 for orders (powers of 0) 1,2, and 
3 (corresponding respectively to the symbols +, x and □). For large couplings the 
inference gets worse as the order of the expansion increases due to the presence of 
terms with alternating signs in the expansion as discussed. Indeed, in the inset we 
show that for J m 0.3 the highest eigenvalue approaches 1. In this figure, we plot 
also the value for J*- obtained from Eq. (5.11) (as circles) and we can clearly see 
that it outperforms the other formulas. 



□ 
o 




+ 

X 

□ 
o 



+ 

X 



+ First order 
X Second order 
□ Third order 
O Loop 

I 



X 

□ 



Figure 5.1: Relative error A given in Eq. (5.12) on the inferred couplings as a function 
of the parameter J of the Sherrington-Kirkpatrick model with iV = 200 spins. Monte 
Carlo simulations are run over 100 steps. Averages and error bars are computed over 100 
samples. Top: orders /3, /3 2 and /3 3 of the expansion. Bottom: expression (5.11) which 
includes the sum over all loop diagrams. Inset: largest eigenvalue A of matrix M as a 
function of J. 



If we try to use the same procedure to test the performance of our loop summation 
formula, we get results like the ones shown in Figure 5.2: 
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Figure 5.2: Absolute error J x A on the inferred couplings as a function of the parameter 
J of the Sherrington-Kirkpatrick model. Inference is done through formula (5.11), which 
takes into account all loop diagrams. The error decreases with the number of spins and 
the number of Monte Carlo steps (shown on the figure). 

We can clearly see above that the error on the inferred couplings for the Sherrington- 
Kirkpatrick model is essentially due to the noise in the MC estimates of the corre- 
lations and magnetizations, since it decreases with the number of steps. 

5.2 Combining the two-spin expansion and the 
loop diagrams 

In the previous section, we have identified a set of diagrams whose sum yields 
the mean-field approximation we saw in chapter 2. In section 3.1, we have inferred 
exactly the value of J for a system composed of only two spins (Eqs. (3.6-3.9)). We 
will see that it is easy to identify summable diagrams also in this two-spin case. Our 
system is composed only by two spins i and j, thus there can be no diagrams of 
more than two vertices in the expansion of S. Moreover, since the formula is exact, 
the expansion in the case of two spins contains all the two-spin diagrams. Indeed, 
the first four terms of the Taylor expansion of Eq. (3.7) on small Qj are 

Jij = fSKij - 2m i m j Kf j + [l + 3m 4 2 + 2>m) + %m\m 2 ^ + ... , 

= • • • • ' • • . (5.13) 

It is easy to identify this formula as the two-spin diagrams in Eq. (4.39). Using the 
explicit formula for S 2 ~ spm given in Eq. (3.6) and applying Eqs. (3.7-3.9), we obtain 
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s. 



2-spin 

ij 



^jfl-spin _|_ ^fl-spin 



where 



1, 

+ 4 log 

+ 4 log 



gl-spin 



1 + 

1 - 

1 - 

1 + 



(1- 


mi)(l - 


rrij) 








(1- 


"0(1 + 


rrij) 




Cij 




(1 + 


mi)(l - 


rrij) 




Cij 





[1 + mj)(l + rrij) 



[Cij + (1-7710(1-771^-)] 

[cij - (\-mi)(\ + mj)\ 

[Cij - (1 + 7710(1-771^-)] 

(l + mO(l + m j )], (5.14) 



1 + m,, 1 + rrii 1-m,, 1 — rrii 



In 



In 



(5.15) 



We have now an explicit formula for the sum of all two-spin diagrams. To go as 
further as possible in our expansion, we would like to sum both all the loop and 2- 
spin diagrams. To combine Eq. (5.14) with Eq. (5.7), we need to remove the diagrams 
that are counted twice, since the loop expansion contains two-spin diagrams (see for 
ex. the first diagram in Eq. (5.8)). To evaluate the two-spin diagrams of S loop , 
we can simply evaluate it for the particular case of N = 2, where all the diagrams 
involving three or more spins are zero. Thus, 



gloop and 2-spin _ fog 

2 



det 



log (1 - K*LiL s ) 



Finally, we can write an equation combining both sums: 



(5.16) 



-fl-spin 



g2-spin + loop _ \ gl-spin . \ r I g 2-spin g±-spm g 

/ j ' / j I ij i 3 

i i<j 



1-spin 



-^-^log(l-XjL^.). 



(5.17) 



i<j 



Note that this formula contains all diagrams shown in Eq. (4.38). The corresponding 



formula for J,* is 



J 



*(2-spin+loop) 



K 



T*loop , r*2-spin _ 

ij ij ~ 1 - KfjLiLj 



(5.18) 



where, as we have already seen in Eq. (3.7), 



J, 



* (2-spin) 



ln[l + K i:j (l + mO(l + mj 



+ -ln [l + ^(l-mO(l-m,)] 



- K i:j (l - mi )(l + rrij)} 



-- In [1-7^(1 + 7710(1-771,)] 



(5.19) 
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5.2.1 Three spin diagrams 



In the case of a system with a zero local magnetization, we can find a rather simple 
expression for couplings Jy of a system composed of only three spins <7j,<7 j,(Tk '■ 



T*3-spin 

J ij;k 



log 
log 



1 Cij Cjfc CjTj 

1 Cij ^ik Cjk 

1 Cij Qfc Cjk 

1 Cij Cj/j -|- CjTj 



+ i log 



1 -\- Cij ~\- Cifc ~\~ Cjk 



-%3 



(5.20) 



Proceeding in the same way we did with the two-spin diagrams, we can combine this 
formula with the previous results: 



J, 



2-spin+loop+3-spin j-*2-spin+loop 



U 



l 3 



_|_ \ j-*3-spin \ J j2 spins 

/ ^ ij'Js / j | ij 

k{k+i,k+j) k(k^i,k^j) I 



(5.21) 



+ 



Cij CikCjk 



^3 



1 C ij C ik C jk + 2 c ij c jkCki 1 c 



5.3 Quality of the inference after summing the 
loops and 2-3 spin diagrams 

In this section, we will look at how our results perform for two different well- 
known models. First we will look analytically at the one-dimensional Ising model 
and afterwards we will see numerical results for the Sherrington-Kirkpatrick model. 
We will test both the inference using just the loop diagrams we saw in Eq. (5.9), the 
combination of loops and two-spin diagrams we saw in Eq. (5.18) and the combina- 
tion of loops, 2-spins and 3-spins diagrams we saw in Eq. (5.21). 

5.3.1 One-dimensional Ising 

For the one-dimensional Ising model, we can evaluate exactly the coupling as a 
function of the correlations (see Eq. (2.5)): 



J] 



hi 



(5 



k+i,i 



+ S 



tanh 1 ( cij 3 



(5.22) 



where 5ij is the Kronecker symbol. Note that the obtained value of J^i should 
not depend on the pair of sites i,j chosen. Using our formula for jfj Spm given in 
Eq. (5.19), we have 



j2: s P in = tanh" 1 c tJ = tanh" 1 



(tanh J) 



\i-j\ 



(5.23) 



which predicts correctly the values of the couplings between closest neighbors J^i+i = 
J, but gives an non-zero result for the other couplings. On the other hand, using 
the loop summation formula from Eq. (5.9), we get 



rloop 

J ij — 



(5.24) 
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where c = = tanh J and 5ij is the Kronecker function. The loop sum correctly 
predicts that the model has only closest-neighbor couplings but does not predict 
correctly its value. 

Finally, using both the 2-spin diagrams and the loop sum (see Eq. (5.18)), we 
have 



^spin+loop = j {5 .. +i + k ._ i) + 



tanh dj — - — ^-j- 
1 ~~ c ij ■ 



J(6i,i+i + + 0(c 6 ) , 



(1 - 5i,i+i)(l - 8i,i-i) 

(5.25) 



which is correct to the order 0(c 6 ). As we will see in the following, the next con- 
tribution to the couplings coming from the expansion in Eq. (5.28) corresponds to 

whose leading term is indeed proportional to c^+2 • c? i+1 • cf +1<i+2 = c 6 . 



5.3.2 Sherrington-Kirkpatrick model 



In section 5.1, we saw that when we used Monte-Carlo simulations to evaluate the 
quality of the inference for the SK model we were limited mostly by the numerical 
errors of the MC simulation. To have more precise values, we now evaluate the 
error due to our truncated expansion using a program that calculates through 
an exact enumeration of all 2 N spin configurations. We are limited to small values 
of N (10, 15 and 20). However the case of a small number of spins is particularly 
interesting since, for the SK model, the summation of loop diagrams is exact in the 
limit N — > oo, as we discussed in section 2.3. The importance of terms not included 
in the loop summation is thus better studied at small N. 

We compared the quality of the inference using the loop summation (Eq. (5.9)), 
the combination of loop summation and all diagrams up to three spins (Eq. (5.21)) 
and the method of susceptibility propagation we discussed in section 3.3.2. Results 
are shown in Figure 5.3. The error is remarkably small for weak couplings (small J), 
and is dominated by finite-digit accuracy (10~ 13 ) in this limit. Not surprisingly it 
behaves better than simple loop summation, and also outperforms the susceptibility 
propagation algorithm. 
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Figure 5.3: Relative error A (Eq. (5.12)) as a function of J for the SK model for our 
summation J? : spin + loop + 3 spin (Eq. (5.21)) compared to the Susceptibility Propagation 
method of Mezard and Mora [Mezard 08] and loop resummation j] oop (Eq. (5.9)). 



5.4 Numerical evaluation of high-order diagrams 

In the two last sections we saw that summing all loop diagrams improved consid- 
erably the robustness of the inference. In this section, we will try to find some more 
terms numerically, in the hope that we might find other classes of exactly summable 
diagrams. We will proceed in a similar way as in section 4.4.2, where we used a 
numerical fit to validate our expansion in small /3. We will use the same method to 
find new diagrams in the expansion by guessing a general form of the lowest-order 
missing terms and finding numerically their coefficients. 

We started by defining a list of several possible corrections to Eq. (5.17) (since 
Eq. (5.21) is only valid for rrii — 0) such as 

^2-spin + loop 

+ ^2 KijKjkKkii a i + a 2(n2i + rrij) + a 3 (rrij + mi)m k + a^mirrij 

i,j,k 

+ a 5 (m^ + m|) + a 6 m 2 k + a 7 (m\ + m|) 
+ a%m\ + agmimjm k + a 10 mimjml] . 

Note that in the same way as in Eq. (4.38), the numerical coefficients in the expansion 
must be a fraction of small integers, as a consequence of our expansion procedure. 

We followed the same method described in section 4.4.2 for each one of our 
guesses. We found that for all of them, with the exception of the one shown in 
Eq. (5.26), the fitted coefficients did not correspond to a fraction of small integers 
as required. Unlikewise, for the guess shown in Eq. (5.26) all the coefficients were 
zero except a 4 = — 1 and a w = 1. If eventually there was one extra term missing 



(5.26) 
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in Eq. (5.26) (for example a term on KfjKjkKki), the parameters other than 04 and 
a w would have some bogus value to compensate for the missing term. Accordingly, 
finding only two non-zero coefficients is a strong evidence of the lack of additional 
corrections other than those shown in Eq. (5.26). Indeed, this is corroborated by 
the value of the squared mean deviation, which is of the same order as c 6 , as we 
would expect in an expansion with no supplementary term on c 5 missing. Finally, 
the expansion up to order 0(c 5 ) is given by 

S = S 2 " spm + loop - ]T K l3 K] k Kl imi m 3 UL 3 Ll + 0(c 6 ) . (5.27) 

i,j,k 

In the particular case of a system with zero magnetization, the guesses of the 
corrections are much simpler since there is no arbitrary polynomial on m multiplying 
each term. We could then find all the terms of the expansion up to 0(c 8 ): 




(5.28) 



5.5 Expansion in n-spin diagrams 

All the results seen up to now were only valid on a small correlation limit. Un- 
fortunately, for actual neuron data, there might be two or more neurons with very 
strongly correlated activity. Consequently, here we will try another approach, based 
on the fact that neurons spend the most of their time at rest. Their magnetization 
is thus very close to —1 (or +1, depending on which convention one chooses for the 
rest state). We derive thus an expansion of the couplings valid for values of magne- 
tization close to ±1. The technical details can be found on appendix B. Our final 
result is 

jk-spin diagrams = j2- Sp in + jS-spin + _ jk-spin _ ( repeated diagrams ) , ( 5 . 2 9) 

and the error is given by 

J*. = J>™ dia s rams + O [(1 - m>) k - 2 ] . (5.30) 

where J^" spin is the sum of all diagrams in the expansion of Jy involving k spins. 

The results seen previously in this work (see Eq. (4.38)) suggest that J^" spms is 
of order 0(c k ), with the lowest order diagram being the loop over k spins, thus 

J*. = J*™ dia § rams + O (c k+1 ) . (5.31) 

We can then expect that summing all diagrams up to k spins might be a very good 
approximation both in the strong magnetization regime (see Eq. (5.30)) and in the 
week correlation one (see Eq. (5.31)). 
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Unfortunately, even for values of k as small as k = 4, we cannot find an exact 
expression for J^" spm as we did for k = 2 in Eq. (3.7). In their PNAS paper, Cocco 

et al. [Cocco 09] note that J^" spin can be obtained numerically, by exact enumeration 
of all 2 k possible states of a /c-spin system. Using this method, they could sum all 
diagrams up to 7 spins. They could also combine this method with summing all 
loop diagrams, which has improved the performance of their inference. 
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In part II, we have seen a very general treatment of the inverse Ising problem. 
In this part, we are interested in a particular case of the same problem: inferring 
the patterns of a Hopfield model, introduced in section 2.4. 

In chap. 6, we deal with the problem of inferring a set of p patterns from the 
measured data under the supposition that the number of patterns is a non-extensive 
quantity. We derive explicit formulas for the patterns as a function of the mag- 
netizations and correlations in both the paramagnetic and ferromagnetic phases of 
the model in the limit of large system size. Interestingly, for the paramagnetic case 
we find in the leading order the same formula found in section 5.1 for the loop 
summation. 

The goal of chapter 7 is to find an estimation of how many times one needs to 
measure a Hopfield system to be able to have a good estimate of its patterns. To this 
end, we use the concept of Shannon entropy introduced in chapter 3 to estimate the 
quantity of information we lack about the system. We evaluate explicitly the entropy 
for a typical realization of the system as a function of the number of measurements. 
We find that when the system is magnetized according to one of the patterns, we 
can find this pattern using just a non-extensive number of measures. On the other 
hand, to find the patterns that were not visited in any of our measures one needs an 
extensive number of measures. 
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Chapter 6 

Pattern inference for the Hopfield 
model 

Up to this point we have dealt with the problem of inferring a coupling matrix 
{ Jij} of a generalized Ising model. In this chapter, we will look at the particular case 
of a Hopfield model. There are several potential advantages of this model: first, as 
we saw in section 1.1.2, in some experiments one needs to infer a set of patterns from 
the measured data. Secondly, we might expect that reducing the number of degrees 
of freedom might make the inference procedure more stable. Finally, the Hopfield 
model can be solved analytically and thus we expect to have a better control of the 
inference errors. 

In principle, one could proceed by first inferring the matrix {Jij} from the data, 
as we have done in part II, and then diagonalizing it to extract a set of patterns. The 
problem with this approach is that it is not optimal from the Bayes point of view: 
the inferred patterns are not the ones that maximize the a posteriori probability. 
This is particularly relevant when the assumption that the underlying system is 
governed by a Hopfield model is just an approximation, as will almost always be the 
case in biological data. In this case, we cannot guarantee that the patterns obtained 
by diagonalizing the {Jij} matrix are the ones that best describe the data. 

This method was carried out in a paper recently published by Haiping Huang 
[Huang 10], where several different methods for solving the inverse Ising model was 
used to find the couplings of a Hopfield model. In their paper, they show that the 
method presented in chapters 4 and 5 does not perform significantly better than the 
naive mean-field method for a Hopfield model, which gives yet another reason to 
look for a method specific to this model. 

In this chapter, we suppose that we have measured L configurations of a system 
that is governed by a Hopfield model. We would like to deduce both the sign and the 
magnitude of the patterns from the data using a Bayesian inference, as defined in 
chapter 3. We suppose that we wait long enough between two successive measures 
so that they show no temporal correlation, i.e., that our configurations constitute 
an independent and identically distributed sampling of a Boltzmann distribution. 

We will first start with the simple, albeit not very useful, case of Hopfield model 
with a single pattern. In this case, the calculations can be done in a few lines 
and allow one to get an idea of the structure of the solution for the general case. 
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Afterwards, we deal with the case of any number of patterns p, which is the main 
result of this chapter. 



6.1 A simpler case: inference of a single pattern 



In this section we suppose that we have measured L independent configurations 
of system a 1 , . . . , a L , where each configuration are given by a 1 = {a[, . . . , a l N } and 
a\ = ±1. From this data, we can, for example, evaluate the measured correlations 
and magnetizations 

™* = ^X>-, ^ = ^X>M- (6-i) 
i=i i=i 

We suppose that our system can be described by a Hopfield pattern with a single 
pattern. Its Hamiltonian is thus given by 



i<j 



(6.2) 



where are real values that describe the pattern, <jj are the spin variables and N is 
the number of spins of the system. The partition function is given by 







N 



(6.3) 



Witt) = E ex p 

M L~ i<j 

For the rest of this chapter we will avoid the explicit dependence on (3 by performing 
the change of variables £j — > £,i/y/]3. We will also omit the dependence of the 
partition function on {&} to simplify notations, posing Z(j3, {&}) = Z. 

We would like to infer both the sign and magnitude of £j from the measured con- 
figurations {<7 1 }- Using the Bayes theorem as announced in Eq. (3.10), the likelihood 
of the patterns is given by 

L 



N 

i<j 



(6.4) 



Using Eq. (6.1), we can rewrite this expression as 

- \ogP({^}\{a 1 }) = - XogZ - \ogP({a 1 }) 



+ -logP (te}) + exp 



(6.5) 



To maximize this expression, we need to evaluate log Z explicitly. Using an integral 
transform, Eq. (6.3) becomes 



J — c 



dx 



V2tiN- 

dx 
V271N- 1 



= exp 



exp 



N 



-x 



N 



-x 



l°g l 2 cosh«i)] 



(6.6) 
(6.7) 
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In the following, we will set the prior to Po{{£,i}) = 1, since we can see from Eq. (6.5) 
that it is irrelevant for large values of L. In the following we will treat separately 
the ferromagnetic case (m ^ 0) and the paramagnetic case (m = 0). 

6.1.1 Ferromagnetic case 

In the ferromagnetic phase, we can use the saddle-point method to evaluate the 
integral in Eq. (6.7), yielding 

logZ = -^x 2 + ^log[2cosh(:r&)] +0(1/N) (6.8) 

i 

with x given by 

x = ^2 & tanh «i) • (6-9) 

i 

To infer from the data, we follow the maximum likelihood principle and 
maximize Eq. (6.5) with respect to {^}, obtaining 

^ Yl Ci & = x tanh ( m &) • ( 6 - 10 ) 

j 

It is easy to see that for such a system the correlation is dominated by its non- 
connected part: Cy = m^m.,- + 0(1/N). Applying this result to Eq. (6.10), we 
obtain 

£i = — tanh _1 mj, (6-11) 

x 



and 

2 



N 



— uij tanh 1 m i . (6-12) 



We found that the pattern is simply a function of the local magnetization, what is 
not very surprisingly knowing we are dealing with the ferromagnetic phase. 

6.1.2 Paramagnetic case 

In the paramagnetic case, the saddle point is x = 0. Consequently, we need to 
evaluate the next term in the large limit to find a non-trivial partition function: 

P dm f N 2 v^, . 

J V27TN- 1 6XP 1 ~ 2~ m + 4^ g ^ cosh ( m &)J j" ' 

/•°° dm \ N 2 m 2 ^~ 
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Maximizing with respect to we obtain 

j N l^j Sj 

We conclude thus that £j is proportional to an eigenvector t>j of the matrix C. The 
corresponding eigenvalue A allows one to find the proportionality constant 




(6.15) 



where the eigenvectors are normalized in the following way 

3 

It still remain to be decided which pair of eigenvalue/eigenvector we should pick 
to obtain our pattern. Since we suppose our patterns are real- valued, Eq. (6.15) 
implies that we should choose an eigenvector greater than one. Moreover, if more 
than one eigenvalue satisfy this condition, we can easily deduce from Eqs. (6.15), 
(6.16) and (6.4) that the greatest eigenvalue is the one that maximizes the likelihood. 

We verified this formula using Monte-Carlo simulations and the inferred patterns 
showed a good agreement with the real ones used to make the simulation. 



6.2 Inference of continuous patterns for p > 1 

In this section, we look at the more general and interesting case of a system with 
several patterns. In the same way as in the previous section, we will treat the ferro- 
and paramagnetic case separately. But first, however, we show how the problem is 
theoretically harder to define in this case. 



6.2.1 Discussion on the gauge 



A major issue in the inference of the patterns of the Hopfield model is that if 
the patterns can take real values the problem is ill-defined: there are many patterns 
that could describe equally well the data. Suppose for example the case p = 2: 

2 / \ 2 



H = N 



(6.17) 



If we define alternative patterns £j = £j cos 9 + sin 9 and £f = —Q sin 9 + cos 9, 
our new Hamiltonian is 

2 

H = N ' 



+N 
H. 



±J2(til™s9 + gsm9)<j t 

i 

sin fl + ^cosfl)^ 



(6.18) 
(6.19) 
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More generally, if one has p patterns, doing a rotation in p dimensions will not 
change the Hamiltonian of the system. We say that the model has a gauge invariance 
in respect to such rotations. Thus, for inferring the patterns one need either to add 
additional constraints to the remove the p(p — l)/2 degrees of freedom or to add a 
prior probability to the patterns, which would select a particular preferred rotation. 
In the following, we will choose the former solution, since it makes our calculations 
simpler. 

6.2.2 Ferromagnetic phase 

Suppose that we have a sample of L measures of our system, containing l\ mea- 
sures where the system was magnetized according to the first patterns, I2 measures 
with the system magnetized according to the second and so on. In this case, we have 

1 

k=i leik 
k=i 

where 

mk = ^E^ tanh ^C fc ), (6.21) 

i 

and 

m\ = tanh(m fc £f). (6.22) 

A consequence of Eq. (6.20) is that the matrix Cy has exactly p eigenvalues that 
are extensive and their corresponding eigenvectors are proportional to mf. We can 
thus easily solve Eqs. (6.20-6.22) by diagonalizing the matrix C^. Note that we did 
not have fixed a gauge when writing these equations, but considering that are 
proportional to the eigenvectors of the matrix Cjj implies that they are orthogonal. 
Thus, this procedure is equivalent of choosing the gauge that satisfies: 

tanh(m fe ^ fc ) tanh(m fe '£f ) = 0, for k ^ k' . (6.23) 

i 

6.2.3 High external field case 

In this section, we will suppose that our external field is strong enough so that the 
spins are not magnetized according to any of the patterns but only according to the 
external field. We will be interested in inferrin both the patterns of our model but 
also the value of the external fields, which might be site-dependent. The calculations 



(6.20) 
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that will follow can be considerably simplified by replacing our usual Hamiltonian 
(Eq. (2.32)) by a slightly different one: 



h = Yl E - tanh h ^ a i - tanh h i) - E h ^ • ( 6 - 24 ) 

/^=l i<j i 

This Hamiltonian can be related to the usual one by a translation on the local fields: 
From Eq. (3.10), the a posteriori probability for the inference is then given by 



zm)) L p(W)) 



X 



x ]^[exp 



i=i 



I 
N 



E E tftW - tanh - tanh h i) + p E h ^ 



H=l i<j 



(6.26) 



Introducing the measured values of the magnetizations and of the connected corre- 
lation 



m. 



= 7E a '' c ^ = 7EA 



(6.27) 



we can rewrite our probability as 

Pom}) 



p(m\W}) 



mn) L p(w 1 }) 



exp 



EE^ 



H=l i<j 



+ £i£j( m i ~ tanh hi)(mj — tanh /ij) + /3L ft,,, 



(6.28) 



To follow our Bayesian approach of maximizing this probability in respect to the 
patterns and external fields, we need first to evaluate logZ explicitly. Since this 
calculation is straightforward its details can be found in appendix D. We obtain 



log Z = J2 M2 cosh hl )-W log X , + -L J2 Sj ^^ + 0(1 /N^) , 



where 



flU 



' )J,V 



^ £(tf) 2 (O a (l - 3 tanh 2 - tanh 2 h t ) , 

i 

i-^E^H 1 -^ 2 ^)' 

i 

^E^a-tanh 2 ^). 



(6.29) 

(6.30) 
(6.31) 
(6.32) 



66 



6.2. INFERENCE OF CONTINUOUS PATTERNS FOR p > 1 



CHAPTER 6. PATTERN INFERENCE FOR THE HOPFIELD MODEL 



Optimization of the probability 

In the following, we will ignore the prior, which is justified in the limit L — > oo. 
To maximize Eq. (6.28) one could just maximize the following quantity 



lo § p = ^ E W c « + ^ E ^( m * - tanh K- - tanh h i) 

+ £ hirrii - log Z . (6.33) 



(6.35) 



As discussed in section 6.2.1, we need to choose a gauge to make our problem well- 
defined. From Eq. (6.29), a natural choice to simplify our equations is adding a 
Lagrange multiplier x^ u to fix the gauge s^ u = 0. We obtain thus 

l0g P = E ^'' r '' + 2^ E &% M ( m * ~ tanh ^)( m i ~ tanh + 

- £ log(2 cosh hl ) + i £ log - ^ £ ^- - £ • 

(6.34) 

Optimizing with respect to h iy we obtain 

- °f — = =m 8 — tanh/ij — — tanh 2 hAf^— y^fffm,- — tanh/i,) 

+ y tanh ^(1 - tanh 2 U + 0(1/A^ 2 ) . 
Posing 

a M = £ £f ( m * - tanh ^) > (6-36) 

i 

we can multiply Eq. (6.35) by £f and sum over i, yielding 

Xv a v = -J^£^J^£(ef) 2 ertanh^(l-tanh 2 ^). (6.37) 

Eq. (6.37) shows that unless the matrix s^ u happens to have an eigenvalue equal to 
Xn, it is of order 1/y/N. We can then rewrite Eq. (6.35) as 

rrii-t&nhhi = -— V tanh hAl - tanh 2 hA + 0(1/N 3/2 ) . (6.38) 

which shows clearly that m« = tanh/i, + 0(1/N). 

We will now optimize our probability with respect to £f : 

= tanh 2 ^)-^£ ^|^(l -Stanh 2 /,,)(!- tanh 2 /.,) 

-4 £ - tanh2 ^ - 4 E - tanh 2 K) 

+0(1/A^ 2 ) , (6.39) 
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Where we used the fact that jjOt^rrii — tanh/ij) = 0(1/N 5 ^ 2 ) to ignore subleading 
terms. In the following, we will start by solving this equation in the leading order 
on N. 



Solution in the leading order 

We want to find the solutions and h® of Eqs. (6.38) and (6.39) up to the 
leading order on N. Ignoring sub-leading terms, this equations are respectively 
given by 

= ^e°(l-tanh^), (6.40) 

and 

h°i = tanh" 1 (6.41) 
If we define vf as the eigenvector associated to the eigenvalue \ a of the matrix 

My = % , (6.42) 
yj\ - tanh 2 h° J I - tanh 2 h] 

we have 

if = J 1 - ^ , V * • (6-43) 



A ^ sjl - tanh 2 



Note that the orthogonality of the eigenvectors assures that our gauge s^ u = is 
respected. 

The procedure of diagonalizing the correlation matrix to extract underlying in- 
formation is known in the statistics literature by the name of Principal Component 
Analysis. We already mentioned in the end of section 1.1.2 how this method is cur- 
rently used to find patterns in neuronal data. It was also used by Ranganathan et 
al. [Halabi 09] to find functional groups in proteins. Our approach gives a Bayesian 
justification for using the PCA for neuron data and allows us to write the probability 
distribution for the measured system: it is just the Boltzmann distribution for the 
obtained Hopfield model. 

Note that Eq. (6.42) implies 



j = y Xa-1 v?vf 

« K J(l-m?)(l-mJ) 



1 



^/(l-m t ?)(l-mj) 



[M-\M-l)} t . , (6.44) 



which is exactly the same formula found in section 5.1 for the loop summation, but 
using the convention of Ma = 1 instead of Ma = 0. While this is an encouraging 
sign that our calculations are correct, it does not bring any new result. We will thus 
now try to go beyond the leading terms in and look for the first correction to this 
expression. 
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Sub-dominant corrections 

Applying our first approximation of the patterns to Eq. (6.38), we can eval- 
uate the first correction to the field hi. Posing hi = h® + h\, we have 

h] = Kyv-AY tanh - tanh 2 h°) (6.45) 

1 1 — tanh W N ^ X ° u 

Since now we have a more precise value of hi, we can redo the procedure of solving 
Eq. (6.40) we used for the leading order using this new value. We will denote the 
obtained patterns ^f' 1 . Note that ^f' 1 is correct up to the leading order in N, in the 
same way as £f' °, since it neglects the subdominant terms in Eq. (6.39). Finally, 
to find an expression for the patterns that is correct up to the first leading order, 
C^' 2 = Cf' 1 + we d° a Taylor expansion of the dominant orders of Eq. (6.39) 
around = $>\ 



Ji E ~ JfV 5 * {1 " tanh2 ki) + W> E - tanh 2 hi) 

2 Y^^tf 5 ^ 1 - tanh2 W - tanh2 h i) = 



N2 xl . 

1 t^it v ^\i 

(* " 3tanh2 W ~ tanh2 



^iE^C'd-tanh^,). (6.46) 



Similarly, our gauge equations s MJ , = yields 



-7= + ^f)(l - tanh 2 h]) = . (6.47) 

* i 

At last, Eqs. (6.46) and (6.47) form a non-homogeneous linear system in the 
variables Sf and x^ u (fj, < v) which can be solved numerically by a simple matrix 
inversion. 



6.2.4 Numerical verification 

In this section we will verify numerically the correctness of both the inference up 
to the dominant order shown in Eqs. (6.42) and (6.43) and its subdominant correc- 
tions shown in Eqs. (6.46) and (6.47). We will proceed in the following way: first, 
we will choose a set of patterns {£f } and fields {hi} . Then, we will use a numerical 
method (that we will explain in the following) to compute the correlations and local 
magnetizations of the corresponding Hopfield model. Using these quantities, we will 
use our inference procedure to find the the inferred patterns, both in the dominant 
order {£f'°} and with the subdominant corrections {£f' 2 }- Finally, we will compare 
the obtained patterns with our initially chosen ones to estimate the quality of our 
inference. 
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Numerical evaluation of the correlations of the Hopfield model 

To verify the validity of our inference, we needed a numerical method that pro- 
duced very precise values for the correlations, so we could be sure that any disagree- 
ment between the real patterns and the inferred ones were due to shortcomings of 
the inference procedure and not due to numerical errors. Moreover, we wanted a nu- 
merical simulation that scales well for increasing N, since our equations are correct 
in the large N limit. 

To be able to satisfy these requirements, we restricted our input patterns and 
fields to a four-block configuration: 



N/A N/A N/A N/A 



e 


, — a — 

= 




bi, 


• • • , 61, 


Cl, 


• • • , Ci, 


di, 


• • • , di}, 


e 


= {«2, 


02, 


b 2 , 


• • • , b 2 , 


C2, 


• • • , c 2 , 


d 2 , 


• • • , d 2 } , 


e 


= {^3, 


«3, 


6s, 


• • • , h, 


c 3 , 


• • • , c 3 , 


d 3 , 


• • • , d 3 } , 


h 


= {hi, ■•■ , 


hi, 


h 2 , 


h 2 , 


h 3 , 


••• , h 3 , 


h 4 , 


• • • , M , 



(6.48) 



where {a{\, {bi}, etc are real values. Since in our calculations we suppose that 
Tlii^i^i = 0(y/~N) and tanh/ij = 0(y/~N), we have restricted our number of 

patterns to three since it is impossible using four blocks to satisfy these conditions 
for more patterns. In our simulation we have chosen values of {aj}, {bi}, ... so 
that the patterns obey our orthogonality condition and that the magnitude of the 
patterns are small enough not to be too close to the phase transition at |£f | = 1. 
With the choice of patterns shown in Eq. (6.48), the partition function is 



{a} I M=l 




i=N/ 2 +l 



AT/2 



i=AT/4+l 



i=37V/4+l 



(6.49) 



We can see that the Hamiltonian depends only on mi = Yl!ili a ^ m 2 = Yl 
etc. Now we replace the sum over all the spin configurations by a sum over all possible 
values of mi, m 2 , m 3 and m^, which allows one to evaluate the partition function, 
the correlations and the magnetizations with a complexity of 0(N 4 ). 



^A^/2 



=iV/4+l ' 



Comparison of the inferred with the real patterns 

We cannot directly compare the inferred patterns with the real ones to evaluate 
the error of our procedure since they might differ in gauge. Thus, we used in our 
comparison a value that does not depend on the gauge: J^- = J^^^i^j- The results 
of the comparison of the real patterns with the inferred ones can be seen in the 
following graph 
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Figure 6.1: Inference 
100. In blue we have 



error 

real 



JJ. 



for the different elements of the matrix Jy for N 

E M £f f?° and in red we have J T l - £/*£f' 2 £. 

In this graph, we have {a\, b\, ci, d\} = {0.693,0.4,-0.8,0.4}, {02, 62, C2, cfe} 
{0.693,-0.8,0.4,0.4}, {03,63,03,03} = {0,0.693,0.693,0.693} and {hi, h 2 , h 3 , h±} 
{0.254, -0.283, 0.416, -0.380}. 



We can see clearly in Figure 6.1 that the subdominant corrections improve the 
inference quality. Unfortunately, these corrections are very sensible to noisy data, 
and we could not see an improvement of the inference for both Monte-Carlo and real 
neuron data. 
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Chapter 7 



Evaluation of the inference 
entropy for the Hopfield model 

As introduced in section 3.1, a good estimate of how much data is needed to 
infer a pattern is given by the information-theoretical entropy (Eq. (3.1)) of the a 
posteriori probability of the patterns, given by 

s[{o\}] = -E p [^}IK z }]iogP[{ef}IK z }], (7.i) 

m 

where {£f } are the patterns of our Hopfield model (see section 2.4) and {a\} are a 
set of L measured configurations of the system. 

The entropy can be interpreted as the quantity of information that is missing 
about our system. Thus, when S <C 1, we can say that we have enough data to infer 
with very little error the patterns. In this chapter we will evaluate the entropy for 
the inference of the Hopfield model. As before, we will first treat the simpler case 
of a single pattern before dealing with the more general and interesting case of an 
arbitrary number of patterns p. 



7.1 Case of a single binary pattern 

Let us recall the usual partition function of the Hopfield model (Eq. (2.32)) for 
p=l 



M 



I 
N 



i<j 



(7.2) 



In the particular case where £j = ±1, we can pose a[ = ^a-i and rewrite the partition 
function as 



Z(/3) = ]Texp 
W'} 



I 
N 



i<j 



(7.3) 



This equation is exactly the same as Eq. (2.6) for zero external field, which means 
that the thermodynamics of this model is identical to the infinite-dimensional Ising 
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model presented in chapter 2. From a more technical point of view, contrary to 
the usual Hopfield model with p > 1, the partition function is independent of the 
pattern, which make the calculations considerably simpler. 

In this case, the Bayes a posteriori probability is given by Eq. (6.4), which we 
recall here: 



Pom) 



z(/?)W(M^»n 



cxp 



i<j 



(7.4) 



where we used a different notation for the normalization M . Applying this result to 
Eq. (7.1) and setting the prior -Pq({6}) — 1> we obtain 



S[{a\}] 



N 



x 



x 



1=0 i<j 

6 L 



;=o i<j 



(7.5) 



Introducing the variable 



N[{o\}] =tf[{o\}W) L = E ex P ( yEE^V 



(7.6) 



we can rewrite Eq. (7.5) as 



AT 



Z=0 i<j 



E^EE^ ex p 

{{} (=0 i<j 



N 



EE« 



=0 i<j 



= logA>[{a<}]-/3 



glog^[{gj}] 

9/3 



(7.7) 



Thus, the entropy can be trivially evaluated from iV[{<j'}]. 

We can see in Eq. (7.6) that the expression for N is formally identical to the 
partition function of a Hopfield model where the L measured configurations {<r'} 
play the role of the patterns and replace the spin variables: 



zhop = ex p ( m 53 53 tiZjWj 



M 



N 



(7.8) 



H=l i<j 



For such analogy, the inference entropy shown in Eq. (7.7) has a thermodynamic 
meaning: it is the thermodynamic entropy of the model. 

Eqs. (7.6) and (7.7) give the entropy of the system for a particular set of measures 
{cr'}. Now, it is natural to expect the entropy to be very reproducible across different 
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sets of measurements. In this context, we are interested in evaluating the average of 
the entropy with respect to all possible measurements. Supposing that the data is 
produced measuring an actual Hopfield model with a pattern we have 

(S) = (log^^l^l^-^KlogiV^}])) (7.9) 



with 

L 

s 



logiv) = — ^E^pflEE^H) 1 ^^}]' ( 7 - 10 ) 



1=0 i<j 



where we replaced our usual inverse temperature (3 by a new variable j3 since we 
should not take the derivative in respect to it in Eq. (7.9). 

The entropy of the system is very different if /3 < 1 and the system is in the 
paramagnetic phase or if (3 > 1 in which case the system is in the ferromagnetic 
phase. We will see these two cases separately in the following sections. 

7.1.1 Ferromagnetic case 

In the ferromagnetic case, if we want to evaluate a thermal average of some 
quantity X, we have 

L N Jm*a\li 

{a} 1=1 i=i 2 cosh i/3m*j 

where m* is the solution of the equation m* = tanh(/3m*). 

Since log N is formally identical to the free-energy of the Hopfield model, we start 
with the saddle-point solution obtained in chap. 2, given in Eqs. (2.35) and (2.36): 



2 cosh P ^ mio\ 



i=i 



(7.12) 



and 



^^tanh^^m^A . (7.13) 



m - N 

We recall that the physical meaning of in the Hopfield model is the magnetization 
of the spins according to the pattern /i. In our case, it represents the overlap between 
the pattern we are inferring and the Z-th configuration: mi — ^ J^i^l- 

Eq. (7.13) has always a solution in the form {mi} = {m,m, ...,m} [Amit 85a]. 
Since {a\} are measured configurations of an Ising system in the ferromagnetic phase, 
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we expect them to be very similar: they are all close to the same minimum of the free 
energy of the Ising model. We expect thus that the solution {mi} = {m,m, ...,m} 
to be the minimum of our modified free-energy of Eq. (7.12). Under this hypothesis, 
the entropy reads 




2 cosh (3m ^ a 1 




i 

where the value of m that satisfies Eq. (7.13) is m = m*, fact proven in Appendix E. 



pm tanh a ^ ' ( 7 - 14 ) 



Asymptotic behavior 

One can note that the entropy of this system is the same of the system composed 
of a single spin p = ±1 connected with L independent, magnetized spins. The 
partition function for this system of a single spin is 

Z = 2 cosh (^3m ^ o)j , (7.15) 



and the free-energy reads 



log 



2 cosh (^f3m a^j 



(7.16) 



where the average is done with respect to independent spins a with magnetization 
m. If we want to calculate the average entropy of this system, we note that for large 
x, log(2coshx) — rrtanhrr « (1 + 2|x|)e -2 l x L Consequently 

i, / r \ /3m(L-2k) 

W-E , 9 ... e-^'^l . (7.17) 

~ V K / [2 cosh(/3m)\ 

One may remark that the probability that our single spin p is not aligned with its 
partners is exponentially small on L. There are two extreme cases that contributes 
to this probability 

1. The spins Oi obeys J2i a i ~ (which is very unlikely) and consequently our 
spin p is random. 

2. We have the highly probable situation of Yli a i ~ m -^> but our spin p is 
misaligned with the others, which is very unlikely. 

The case 1 has a probability [cosh(/3m)] _L and gives a log 2 contribution to the 
entropy, while the case 2 has a probability 0(1) but gives a contribution of order 
e -2/?m l ^ entropy. Since for all (3, logcosh(/3m) < 2(3m 2 (note that m is an 
implicit function of (3) , case 1 dominates the behavior of the system for L — > oo. 
So, for large L, we have the general behavior 

S oc e- 7i , (7.18) 
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We can thus conclude that for L 3> I/7 we can infer the unknown pattern with 
a very small probability of error. 

7.1.2 Paramagnetic case 

In the paramagnetic phase we have m — 0. According to Eq. (7.14), the entropy 
is thus equal to log 2 for all values of L. This is correct under the hypothesis that 
L remains finite when N —} 00. For L = aN, the results of the previous section do 
not hold since we cannot use the saddle-point approximation for log Z. 

We start with Eq. (2.33) 



(7.20) 



and we would like to find the average with respect to all possible realizations of 

the measures {cr\}, like we have done in the ferromagnetic case. The main difference 
is that now the saddle-point values of m ; are of order 0(l/vN). 

To find the correct solution under these conditions, we need to use the replica 
trick as explained in section 2.4. The calculations are very similar to those done in 
the solution of the Hopfield model by Amit et al. [Amit 85b] and the details can be 
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found in Appendix F. The obtained value for the entropy is given by 



a(3(l - q ){l-0[2- 2q(l -/?)-(! + t 2 )/?]} a 



2(l-0)[l-(l-q)0\' 



ln[l 



+ (ln 



a 



2 cosh ( —t + z\J aq 



a 



[(l-q)q + tt] , 



;i - Q)0\ 
(7.21) 



where (f) z = f(z)e z ' 2 ^ 2 dz, we remind that a = L/N and 



a 



tanh ( —t + z\J aq 



)>.■ 



i 

20 2 




a 



tanh ( — t + ^a/ ag 



(l-/3)[l-(l-g)/3]' 

/3 2 



;i - fi) [1 - (1 - g)/3] 



2 ' 



(7.22) 

(7.23) 

(7.24) 
(7.25) 



where we remind that t is the overlap between the real and the inferred pattern. One 
can verify that we only have a non zero solution to these equations when a > a c , 

with a c = (l — j^j . In this case both q and t are non-zero. 

When a < a c , t is zero which implies that the inferred pattern has no resem- 
blance to the real one. Surprisingly, the entropy decreases linearly in this regime for 
increasing a, which can be possibly interpreted as a growth of the set of patterns 
known to be incompatible with the data. 

Note that in contradistinction with the ferromagnetic case, to infer the patterns 
in the paramagnetic phase it takes a number of measures that is proportional to the 
size of the system, implying that is much harder to extract information from it. 



Numerical verification 



In Figure 7.2, one can see the behavior of the entropy as a function of a for a 
fixed inverse temperature (3 = 0.5. 
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a 



Figure 7.2: Entropy per spin as a function of a for (3 = 0.5. The solid line correspond 
to Eqs. (7.21-7.25) while the points correspond to numerical values obtained by exact 
enumeration. The dashed line corresponds to the behavior of (S) in the regime of q = t = 
while the vertical line indicates a c . 



To evaluate numerically (S) as shown in Fig. 7.2, we used the following algorithm: 

1. Evaluate Z by exact enumeration; 

2. Generate L = aN configurations {cr-} according to the Boltzmann weight by 
rejection sampling; 

3. Evaluate A^[{cr'}] by exact enumeration; 

4. Evaluate ^[{cr-}] by exact enumeration. 

For every a, we repeated this procedure one hundred times with different random 
seeds, which gives different configurations {cr'} in step 2. The points in the graph 
correspond to the averages of the set of obtained values of S and the error bars were 
calculated using the standard deviation. The result of this procedure supports our 
analytical results. Indeed, in Figure 7.2 we notice that the bigger values of N are 
much closer to the analytical curve, which suggests that the difference between the 
analytical and numerical results is due to the small value of used in the numerical 
calculations. 
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7.2 Case of a single pattern & E M 

The results of the previous section are valid only if our pattern takes bimodal 
values, what might be a very particular case. To verify if the inference of a real- 
valued pattern presents any qualitative difference with respect to the bimodal case, 
we propose ourselves to evaluate explicitly the entropy for the real-valued case in 
this section. 

The calculations are similar to those of the last section, but with the supple- 
mentary complication that the partition function depends on the exact values of 
the pattern whereas in the bimodal case it depended only on the temperature. The 
partition function is given by 

logZ[{fc}] = -ym[{&}] 2 + J>g[2cosh(m&)] , (7-26) 

i 

with 

™m\ = ^£fctanh(fcro[tf}]). (7.27) 

i 

Since £j is continuous, we must modify the definition of the entropy replacing the 
sum by an integral 

S[{a\}] = - [U^ P tt&\te}]*>SPm\{o\}]- (7-28) 

J i 

Using Bayes' theorem, we obtain 

^i--™/n*^B-p(^EE«*})>« 

x -logA/[{^}]-LlogZ(te}) + logP (te}) (7-29) 
1 L 

1=0 i<j 

where A/[{cr'}] is given by 



^(fe» p I" , „ , . 



As in the previous section, we would like to write S as a derivative of the normal- 
ization. For that, we define a modified normalization N by introducing a parameter 
P to M: 

N[{a\},0\ = 

= /n^ ex p(| EE^M+^ io g p o(te})-^iog^})^ (7,32) 

J i \ 1=0 i<j 



80 



7.2. CASE OF A SINGLE PATTERN & e R 



CHAPTER 7. EVALUATION OF THE INFERENCE ENTROPY FOR THE HOPFIELD 

MODEL 



Note that while (3 is similar to a inverse temperature, it is not strictly one, since 
it multiplies also the partition function. Using this definition, we can rewrite our 
entropy as 



S[{o\}]=]ogN[{o\},l]- 



d\ogN[{al},/3} 



df3 



(7.33) 



A supplementary difficulty of the continuous case is the dependence of log Z on 
m [{£}]> which depends on the patterns implicitly according to Eq. (7.27). To make 
that dependence explicit, we introduce the following identity 



■iNxm + i tanh(m^) 



(7.34) 



which is just Dirac's delta written in the integral form. We can thus write the 

e -/3LlogZ term in Eq (^32) as 



f f3LN 



e -/3Lio g z = I dm I drrexp j^^m 2 -/3L^log[2cosh(m&)] 



(7.35) 



iNxm + i tanh(m£j 



Like in the previous section, we would like to evaluate the average of the entropy 
with respect to the different possible measures and to the different real pattern 



(S) = [U d£Po({£})£-^S({^})exp 



1=1 i<j 



(7.36) 



where we impose that the distribution of the unknown real patterns {^} is the same 
of our prior distribution P ({£«})• 

The details of the replica calculation, similar to the last section, can be found in 
Appendix G. It yields 



1 

N 



i. 
2 



/3Lm 2 



i=i 



+ 



exp 



/ dfp (f)]C 
x log { dt; exp 



f^^-Llog(2cosh(mf)) 
i=i 

L 

P ^ - f 31 lo §( 2 cosh (^0) + P iogp (0 



(7.37) 



i=i 
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where 



m 



m 



-I 



dfp (O |tanh(mO = (|tanh(m|)) , 



(7.38) 



/ 



exp 



m£(x z — L log 2 cosh(m£) 



x 



X- 



/ d£p (0£ tanh ( m O ex P 


£f =1 Q^-Llog2cosh(m£) 


/ d£p (O ex P [E?=i 


Q'tr'f -Llog2cosh(mf) 





(7.39) 



Qfc = d ^Yl exp 
J M 



m^o - ' — L log 2 cosh(m£) 



z=i 



x 



x 



/ dCPo(0^'exp 


Ef=i gVe-Llog2cosh(mO 


/ d£p (O ex P 


£f =1 QW£-Llog2cosh(m£) 





(7.40) 



and rii ePote) = p o({C}) is the prior probability of the pattern and the probability 
distribution we used to average over the real pattern, supposed to be identical. 

From this last equation it is straightforward to evaluate the entropy using Eq. (7.33). 



7.3 Case of a system with p patterns, magnetized 
following a strong external field 

In this section we consider the same conditions as we had in section 6.2.3 for 
the inference: we have a Hopfield model with p patterns and we introduce a local 
external field strong enough so that the system is not magnetized according to any of 
the patterns. The Hamiltonian is thus the same as Eq. (2.32). The calculations are 
very similar to these of section 7.4, so they can be found in Appendix I. We find that 
the entropy associated with each one of the patterns is described by exactly the same 
equations as single pattern in the paramagnetic phase, seen in section 7.1.2. We have 
thus the same behavior: we need thus a number of measures that is proportional to 
the size of the system and the exact form of the entropy is given by Eqs. (7.21-7.25). 



7.4 A particular case: a system with two patterns, 
magnetized according to the first 

In last section, we saw that when our system does not visit any one of the patterns 
the inference of each one of them is as hard as describing a system composed of a 
single pattern in the paramagnetic phase. In this section, we will see if there is any 
qualitative change in the situation where the system is magnetized according to one 
pattern but we would like to infer a second, non-visited one. 
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We suppose that our system's Hamiltonian is given by 

i<j 

where the patterns £f = ±1 are binary. We suppose we did L = aN measures 
of the configuration of this system, all of them while the system was magnetized 
following the first pattern. Supposing that the first pattern is perfectly known (what 
is reasonable since our entropy on it goes with e~ 7L = as the inference of the 

first pattern is very similar to the ferromagnetic case we saw previously), we study 
the entropy of the a posteriori distribution of the second pattern 

s = - E p im\m, {o\}] logPimim, . (7.41) 

In the same way we did with the single pattern case in section 7.1, we average this 
entropy with respect both to the first pattern and to the measured configurations. 
The probability is given by 



^ 2 }|{^}>'}] = 



piwi}\m,m]pim} 



i 



(7.42) 



cxp 



I 
N 



zuoMm,m] L 

where the normalization of the probability M is given by 



li=l,2 1=1 i<j 



(7.43) 



cxp 



I 

N 



E EE-He? 



H=l,2 1=1 i<j 



pie 



(7.44) 



As done in Section 7.1, we write our entropy as a derivative of a modified normal- 
ization given by 



I 

N 



EEE-Ki- 



H=l,2 1=1 i<j 



LlogZHoptMC 1 },^}] 



(7.45) 



so that we can easily deduce the entropy as 



S = log N 



d log N 



P=P d/3 



(7.46) 



/3=/3 



where we have supposed P[t; 2 } =2 N . We will now evaluate this quantity explicitly. 
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Doing a calculation very similar to the derivation of Eq. (6.29), we can write the 
partition function of the Hopfield model for p = 2 for a system magnetized in the 
first pattern, obtaining: 

(3N 



\ogZ 



Hop — 



-m* 2 + iVlog [2cosh(/3m*)] - log [l - (3(1 - m* 2 )] 



+ - 



1 



(3m 



*2 



rjy E ^ ^ 



21 -/3(l-m* 2 ) 
with m* given by m* = tanh(/3m*). 

7.4.1 Evaluation of (log N) 

To evaluate (log n\ we use again the replica trick, obtaining 

(N n ) 



(7.47) 
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+ P E E + 5§) - i ^Hop (3, {I 1 }, {| 2 } 



=1 ij 



(7.48) 



where we denote by £ 2 the real second pattern and by £ 2 the inferred one. 
That expression can be simplified to (see Appendix H) 
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A= ( y/N(m* - m*)(ftn + ft) m*/3u m*/3s 1 m*/3s 2 



m 



*fts n ) , 



and 



d l 
d 2 

d 3 
u v 



= (ftn + ft) [l - {ftn + £)(!- m* 2 )] , 



ft 
ft 



1-/3(1 -m* 2 ) 



1-/3(1 -m* 2 ) 



= -(l-m* 2 )ft(ftn + ft)u„, 

= -(l-m* 2 )ftftU, 

= -(l-m* 2 )ft(ftn + ft)s„, 

= -(l-m*W- 



Interestingly, the value of the entropy for the replica-symmetric case of this sys- 
tem is exactly the same of section 7.1.2 (Eqs. (7.21)-(7.25)), but with a equivalent 
inverse temperature of ft' = ft (I — m 2 ). We show thus that the behavior saw in the 
last section remains valid in this case. 
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Chapter 8 
Conclusion 



In this work we have studied the general problem of extracting information from 
the measured activity of interacting parts. This problem is a recurring one in several 
domains, like biology (with the study of neuron networks and proteins), physics, 
social science and economics. 

To solve such a problem, one needs to make a supposition about the underlying 
system that has generated the data. Our results can thus be separated in two main 
parts: in the first, we supposed that our system is described by a generalized Ising 
model, i.e., an Ising model with a Hamiltonian given by H({(Ji}) = J2i<j J-ij^i^j + 

hi<Ji. In this case, the underlying information to be extracted are the couplings 
Jij and external fields hi used in the Hamiltonian. A second part of our results deals 
with the case where the underlying system is described by the Hopfield model of p 
patterns, with an Hamiltonian given by H{{(Ti\) = J2i<j ELi £i£j a i a j + J2i hi&i- 
In this case, the information to be obtained are the values of the patterns {£f } and 
of the external fields hi. 

In part II of this thesis, we derived an explicit formula for the couplings 
and external fields hi as a function of the magnetizations rrii = (<7j) and connected 
correlations c^- = (<7i<Jj) — mirrij. That formula was obtained through a small- 
correlation expansion and was evaluated up to order three on the correlations. We 
developed also a general method through which one could continue the expansion 
up to any desired order. 

Unfortunately, the performance of our approximation up to order three degraded 
very quickly when increasing the values of the correlations. To workaround this lim- 
itation, we identified some terms of our expansion that, once grouped together, cor- 
responded exactly to the first terms of a mean-field approximation already known in 
the literature. We could then replace these terms by the full mean-field approxima- 
tion to find a formula that was much more robust against large values of correlation. 
Moreover, we could identify a second set of terms easy to interpret: all the terms 
that involve only two sites can be shown to represent the exact solution of a system 
composed of only two spins. Thus, in the same way, we replaced these terms by their 
corresponding sum. Finally, we obtained a fairly simple formula that was very stable 
numerically and had a straightforward interpretation: it unifies the contribution of 
a mean-field approximation (which works very well in systems with several small 
couplings) with an independent-pair approximation (which performs well in systems 
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with few, but strong, couplings). We verified that our formula outperforms existing 
methods for the inverse Ising model. 

A possible future extension of this work might be to find out if other methods 
of solving the inverse Ising problem can be interpreted in the framework of our 
expansion. For instance, the susceptibility propagation method is exact on trees, 
and an interesting perspective would be to associate this method with the exact 
sum of all terms that are not zero in this case (in the diagrammatic notation we 
introduced in chapter 5 it should correspond to diagrams that do not contain loops). 
It might be interesting then to add to our formula the exact sum of all those terms. 
Another promising venue of research would be to understand what dominates the 
error of our formula for systems with small couplings, beyond the loop summation 
we discussed in chapter 5. 

Unfortunately, our results have a few limitations. First of all, they cannot work 
for a system that is magnetized in a ferromagnetic or glassy state or any other 
system with high correlations. Secondly, while our method of expanding on small- 
correlation is valid up to any given order, the calculations are impractically long 
beyond order four. 

In the last part this thesis we dealt with the inference of the patterns of a Hopfield 
model. This approach has several potential advantages: first of all, the Hopfield 
model can be solved analytically, which should make our calculations more precise 
and simpler. Moreover, for a fixed number of patterns p, we have only pN real values 
to infer while for the inverse Ising problem we had all the N(N — 1)/2 elements of the 
matrix J^-. Having a smaller number of degrees of freedom can potentially improve 
the quality of our inference when the input data is noisy, since it can avoid "fitting 
the noise". Finally, this model is potentially valid also in the ferromagnetic phase. 

To infer the patterns from the data, we used a Bayesian approach: we looked for 
the set of patterns that maximized the a posteriori probability. We found an explicit 
result exact in the large system size limit for the patterns in terms of the measured 
correlations and magnetizations. Since the Hopfield model is a particular case of the 
generalized Ising model, we could compare this formula with our previous results 
to show they correspond to the mean-field approximation of part II. The formula 
corresponds also a the well-known method for extracting patterns from data: the 
Principal Component Analysis {PC A). Thus, our calculations provide a rigorous 
justification for this method. 

To find a formula that provides a better inference than the mean-field formula, 
we evaluated the first subleading correction to the patterns. We found a formula 
that works well with synthetic data obtained by exact enumeration but quickly falls 
apart for noisy input. 

Besides finding an explicit formula for the patterns, we wanted to evaluate how 
much data is needed for finding a precise estimation of the patterns. For that, we 
computed the information-theoretical entropy of the inference for a typical realiza- 
tion of the measurements. We find that if the system is magnetized according to a 
given pattern, the quality of our estimation grows exponentially with the number of 
measures. Interestingly, this number does not depend of the size of the system in 
the limit of a very large system. On the other hand, if we are looking for a pattern 
where none of the measurements were magnetized, one needs a number of measures 
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that is proportional to the size of the system. 

An interesting perspective for future work would be to try to apply the Hopfield 
results for the data analysis of groups of homologous proteins, where both the PCA 
and a Hopfield-like Hamiltonian have shown to yield interesting results. 

A final remark is that our results can only be used to extract information from 
data constituting independent samples from a Gibbs distribution, which is not nor- 
mally true for biological experiments. More precisely, a fundamental premise of our 
derivations is that the probability of L meaured configurations is just the product 
of L Boltzmann weights. 

Concerning experimental data, neuron activity usually present strong temporal 
correlations. As the free-energy landscape has potentially many minima, it might 
be that the measured configurations correspond to a very particular subset of all the 
states and thus the measures are not independent. In the case of protein families, 
the situation is particularly bad: due to their common evolutionary origin, there is 
a strong bias favoring sampling proteins similar to their common ancestor. In future 
investigations, it would be interesting to work around this problem or at least see 
to which extent it interferes with the inference procedure. A possibility would be to 
take into consideration two contributions to the probability: a term associated to 
the fitness of the protein, similar to what was done here, and a term that accounts 
for the evolutionary history of the family. 
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Recemment, un grand nombre d'experiences en biologie qui generent une quantite 
tres importante de donnees ont vu le jour. Dans une partie considerable de ces 
experiences, dont on peut citer les reseaux de neurones, l'analyse de donnees consiste 
grosso modo d'identifier les correlations entre les differentes parties du systeme. 
Malheureusement, les correlations en soi n'ont qu'une valeur scientifique limite : la 
plupart des proprietes interessantes du systeme sont decrites plutot par l'interaction 
entre ses differentes parties. Le but de ce travail est de creer des outils pour permettre 
de determiner les interactions entre les differentes parties d'un systeme en fonction 
de ses correlations. 

Dans ce resume en langue frangaise, on va commencer par une introduction ou 
on exposera des resultats classiques sur le principal systeme qui a motive ce travail : 
les reseaux des neurones. Ensuite, on va parler brievement de la modelisation qu'on 
a choisi pour ce travail et des resultats connus sur des systemes similaires. 

Dans une deuxieme partie, on presentera un developpement en petites correla- 
tions du probleme d'Ising inverse. Finalement, dans une derniere partie on traitera le 
probleme d'Hopfield inverse, ie., trouver les patterns du modele a partir des champs 
et correlations locales. 



Introduction 



Une des questions scientifiques plus importantes du 21eme siecle est la com- 
prehension du cerveau. Aujourd'hui, il est bien connu que la complexite du cerveau 
est un produit de l'organisation des ses cellules (les neurones) en des reseaux com- 
plexes. Un neurone typique est compose de trois parties : un corps cellulaire, qui 
contient le noyau de la cellule, des dendrites, responsables pour la reception des 
signaux des autres cellules et un axone, qui envoie des signaux a des autres neu- 
rones (voir image). Les connexions entre neurones sont appelees synapses et ont lieu 
typiquement entre un axone et un dendrite. 
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Figure 8.1: Schema d'un neurone [Alberts 02]. Le diametre du corps cellulaire est typ- 
iquement de 10 [im, pendant que la taille des dendrites et des axones varie considerablement 
avec la fonction du neurone. 



Comme toutes les cellules, les neurones possedent une difference de potentiel 
entre son cytoplasme et le milieu inter-cellulaire. Cette difference de potentiel est 
controlee par des mecanismes de pompes d'ions, qui peuvent augmenter ou diminuer 
ce potentiel. Quand la difference de potentiel d'un neurone atteint un certain seuil, 
un mecanisme de feedback active les pompes, faisant le potentiel croitre rapidement 
jusqu'a environ 100 mV (en dependant du type de neurone), apres quoi il atteint la 
saturation et decroit, en revenant au potentiel de repos de la cellule. On appelle ce 
processus un spike. 

A chaque fois qu'un neurone emet un spike, son axone libere des neurotrans- 
misseurs dans les cellules a lesquelles il est connecte, en changent leur difference de 
potentiel. Comme les synapses peuvent etre excitatrices ou inhibitrices, les modeles 
normalement definissent un poids pour les synapses, avec la convention que un poids 
positive correspond a une synapse qui augmentent le potentiel des neurones auquel 
elle est connecte (et done favorise les spikes) et un poids negatif au cas ou la synapse 
decroit ce potentiel. 

Une nouvelle venue de recherche tres prometteuse dans le domaine des neu- 
rosciences a ete le developpement des techniques d'enregistrement multi-neurones. 
Dans ces experiences, une matrice de micro-electrodes (contant jusqu'a 250 elec- 
trodes) est mise en contact avec le tissu cerebral et le potentiel a chaque electrode 
est mesure pendant quelques heures. Un precede sophistique d'analyse de donnees 
permet alors d'identifier la activite individuelle de chaque neurone en contact avec 
les electrodes. La sortie typique de un enregistrement multi-neurones est montree 
dans la figure suivante. 
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Figure 8.2: Resultat typique d'une experience d'enregistrement multi-electrodes 
[Peyrache 09]. Chaque ligne correspond a un seul neurone, pendant que les barres verticales 
correspondent a des spikes. 



En principe, on pourrait trouver les synapses a partir des enregistrements multi- 
electrodes, mais extraire cet information n'est pas trivial. Naivement, on aurait 
envie de dire que si l'activite de deux neurones sont correlees ils sont connectes par 
une synapse, mais considerons trois neurones dont l'activite est correle : on voit 
aisement que toutes les deux configurations montrees dans l'image suivante peut 
rendre compte des ces correlations. 




Figure 8.3: Deux configurations possibles pour trois neurones correles. 



Pour pouvoir donner une contribution a ce probleme, il est necessaire d'abord 
de choisir un modele pour les reseaux de neurones. Le modele d'interesse pour cette 
these est la machine de Boltzmann [McCulloch 43]. Dans ce modele, on modelise 
l'etat d'un neurone par une variable binaire : a = +1 si il est en train d'emettre 
un spike, a = — 1 sinon. Le reseau de N neurones est alors decrit par un vecteur 
{o"i, . . . ,&n}- Pour simplifier, on ignore totalement la dynamique du systeme et le 
modele decrit seulement la probabilite P({a%, . . . , &n}) de trouver le systeme dans 
un etat {ax, . . . , <Jn}, qui est donnee par le poids de Boltzmann de un modele d'Ising 
generalise : 

P({ax,...,a N }) = L e -m{«u:,« N }) j (81) 
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avec 

Z = J2e- 0H{{au -' (7N}) , (8.2) 
M 

oil Z est la fonction de partition du modele, (3 est un parametre du modele, qui dans 
le contexte de spins correspond a la temperature inverse et on introduit la notation 

E= E E - E ■ (8-3) 

{cr} CTi=±l (T2=±l (Tjv=±1 

Le Hamiltonien doit prendre en compte les synapses entre les neurones et qu'il 
doit recevoir une certaine quantite minimum des signaux pour emettre un spike. 
L'expression la plus utilisee est 

H({a 1 ,...,a N }) = -T^^JijViVj -^hidi, (8.4) 

ou Jij correspond au poids des synapses et hi est un terme qui modelise le seuil de 
spike comme un champs qui attire le neurone vers son etat de repos. Ce modele 
est interessante pour ce travail pour deux raisons : d'abord, il permet de mettre en 
ceuvre directement les outils developpes dans le contexte de la mecanique statistique 
et des systemes desordonnes. En outre, ce modele est emerge naturellement quand 
on cherche un modele qui peut rendre compte d'un ensemble de moyennes (<7j) et 
de correlations (<Ji<Jj). C'est important a noter que dans ce modele les couplages 
sont symetriques, ie, = Jji, ce qui n'est pas forcement vrai dans des systemes 
biologiques. 

Le modele d'Ising generalise contient comme cas particulier des differents modeles 
classiques de la litterature. On peut citer le modele d'Ising ordinaire, le modele de 
Sherrington-Kirkpatrick et, d'un interet particulier pour cette these, le modele de 
Hopfield. Dans ce modele, le Hamiltonien est donne par 

a=l \ i J 

oil £f sont des valeurs reels et en developpant le carre on voit bien qu'il correspond 
au modele d'Ising generalise par 

L'idee derriere le modele d'Hopfield est de rendre compte d'un systeme qui garde 
un nombre p de memoires et peut les retrouver a partir d'un etat initial similaire 
a la memoire recherche. Effectivement, dans l'equation (8.5) on voit bien que si les 
memoires £ p sont a peu pres orthogonaux les unes avec les autres, l'etat Oi = 
correspond a un minimum local de l'energie. Un resultat classique (voir [Amit 85a]) 
est que dans certaines conditions, ces etats sont aussi des minimums locales de 



94 



RESUME DETAILLE 



RESUME DETAILLE 



l'energie libre, ce qui permet effectivement de dire que le systeme peut retrouver ces 
memoires. 

Determiner le comportement d'un systeme decrit par un modele d'Ising generalise 
est normalement tres difficile et il n'existe pas de solution generate pour ce probleme. 
On peut, par contre, trouver des solutions approches dans un certain limite de valid- 
ity. Un cas particulier que nous interesse est le developpement en petites couplages 
introduite par Thouless, Anderson and Palmer (dites equations TAP) [Thouless 77]. 
Le resultat qu'ils ont obtenu dit que pour un systeme avec un Hamiltonien donne 
par 

H = -J2j ij a i a j (8.7) 

i<j 

l'energie libre est donnee par 

^1 + rrii /1 + mA ^ 1 - fl-rrii 

i v 7 i v 

+ Jijrmmj + ^JJ(1- ml){l - m)) + 0( J 3 ) , (8.8) 

i<j " i<j 

ou mi = (<7j) est la magnetisation locale du systeme, qui selon les equations TAP 
obeissent 

tanh -1 rrii = Jij m j ~ m i ~ m i) ■ (8-9) 

3 (&) 3 

Malheureusement, dans le cadre des experiences avec les neurones, on n'a pas 
besoin d'avoir une methode pour trouver le comportement d'un systeme en fonction 
des parametres, mais faire le contraire : trouver les parametres qui rendent compte 
au mieux des resultats observes. On parle alors de "problemes inverses". 

Travailler avec des problemes inverses entraine deux complications supplemen- 
taires : d'abord, meme pour les cas ou on peut resoudre le probleme directe, trouver 
les parametres qui decrivent les donnees est normalement un probleme difficile. En 
outre, donner une signification mathematique a la expression "mieux decrit le sys- 
teme" est aussi un point delicat. Une cas possible est que plusieurs differents choix 
de parametres peuvent decrire exactement les mesures. La situation contraire, ou il 
n'existe pas d'ensemble de parametres qui rendent compte des donnees (a cause des 
erreurs experimentaux), est tout aussi possible. 

Pour rendre compte de ce probleme, il est utile de postuler que les parametres qui 
decrivent le probleme suivent eux aussi une loi statistique. On peut alors appliquer 
le theoreme de Bayes qui dit que la probabilite que les parametres valent {Aj} en 
fonction des mesures {Aj} est donnee par 

ou P ({Xi}) est la probabilite a priori des parametres et P({Xj}) est la probabilite 
marginale de {Aj}, qui peut aussi etre interpretee comme une normalisation de la 
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probability P({A;}|{X;}) : 

P({*i}) = E P (^}l^» P o(W) • (8.H) 

{A;} 

Ce theoreme permet de donner une definition precise du "meilleur" ensemble de 
parametres pour decrire le systeme comme ceux qui maximisent P({Xi}\{Xi}). A 
part cela, s'il existe plus d'un ensemble de parametres qui rendent compte des don- 
nees, un choix judicieux de P ({Aj}) permet de "choisir" parmi ces solutions et de 
rendre le probleme bien definit. 



Le probleme d'Ising inverse 

On considere le modele d'Ising generalise avec un Hamiltonien donne par 

= - J2 J v Ui °i - Yl hi ai > ( 8 - 12 ) 

i<j i 

qui definit alors une probability sur les etats suivant la loi de Boltzmann 

P(Wi}) = \e' H ^ , (8.13) 
ou Z est la fonction de partition, donnee par 

Z = J2 e ~ H(Wl}) ■ ( 8 - 14 ) 

W 

Normalement, quand on etudie un tel systeme on cherche a determiner les magneti- 
sations locales m,i = (o"j) et les correlations a deux sites = ((TiCTj) — rriimj en 
fonction des couplages {Jij} et des champs {hi}. Par contre, on parle de probleme 
d'Ising inverse quand on cherche a determiner et hi en fonction des correlations 
et magnetisations. 

Notre point de depart pour resoudre ce probleme est l'entropie de Shannon du 
probleme 1 

S ({ J ij}i{\}', = ^gZ({Jij}, {hi}) Jij (cij +m i m j ) - J^TO; , 

i<j i 

= log ^2 exp <^ ^2 \oi<jj - - rriimj} + ^ tufa -mj)> , 

= log ^2 exp ) Z~Z Ji i ~ m *)( a i ~ m i) ~ c d + /2 Xi ( ai ~ m i)\ ' 

ou on a introduit des nouveaux champs externes A; qui sont lies avec le vrai champs 
hi par \i = hi + ^jJij m j- On voit aisement calculant dS/dJij et dS/dhi qui 



(8.15) 



1 Pour plus de details, voir Chapitre 3 et 4 de la version en anglais. 
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l'ensemble des </*• et A* qui reproduisent les magnetisations m 8 et correlations Cij est 
celui qui minimise S. On va aussi s'interesser a la valeur de S au minimum 

S({mi}, {dj}) = mmmmSdJij}, {A;}; {mi}, {q,}) , (8.16) 

{Jij} {Xi} 

car a partir de cet expression il est possible d'extraire les valeurs de J*- et A* utilisant 

aS({m ^ {/3c « }> =-A-(/3). (8.18) 

Comme le probleme d'Ising inverse est tres difficile a resoudre en toute general- 
ity, on va chercher une expansion pour des valeurs petites des correlations. On les 
multiplie alors toutes pour un parametre (3 qui on introduit, de fagon a qu'un ex- 
pansion en serie autour de /3 = correspond a une serie en petites correlations. 
L'equation (8.15) s'ecrit alors 

s ({ J ij}, {\}; i m i}i il 3 c ij}) = 

= logj^exp I E J ij [( a i - m i)i. a j - m j) - P c ij\ + E -m)? • (8.19) 

Wi} I i<3 i ) 

On cherche maintenant a trouver une expansion de S au minimum (voir Eq. (8.16)) 
pour j3 petit : 

^({to,}, {P aA) = S° + PS 1 + P 2 S 2 + ... , (8.20) 

d'ou on pourra extraire aussi des series pour J*- et A* utilisant l'eqs. (8.17) et (8.18). 

La determination du premier terme de l'expansion de S est trivial, car quand 
(3 = on a des spins decorreles (qui correspond a des spins independants) : 



1 + TO, . 1 + TOj 1 — TOj 1 — TOj 

in 1 in 



(8.21) 



Pour trouver les termes non-triviaux de l'entropie, on procede de la maniere 
suivante : d'abord, on definie un potentiel U sur les configurations de spins par 
(noter le nouveau terme a la fin) 



i<j i 

Jo 



(8.22) 



i<3 

et une nouvelle entropie (a comparer avec eq. (8.15)) 



5({TO,},{ Ci ,},/3)=logE^ ({<Tl}) - (8-23) 



RESUME DETAILLE 97 



RESUME DETAILLE 



Notez que la valeur de U depends de la valeur des couplages J*j{(3') a tous les valeurs 
de p' < (5 et pas seulement pour la valeur de (5 pour laquelle on veux calculer U. S 
et S se relient par 

5({m i },{cy},/3)=5({m i },{cy}, j 9)-X;cy f d/3'J*.(/3') ■ ( 8 - 24 ) 

i<3 J ° 



L'expression de S a ete choisi de sorte qu'elle soit independante de p, comme un 
petit calcul permet de le verifier 

dS ds = -Y.^ J m+Y,^ J m=*■ 



d(3 dp 



(8.25) 



i<j 



i<j 



Notez que cela est valable pour toute valeur de ft, d'ou S est constante et egal a sa 
valeur en ft = 0, S°, donnee par l'eq. (8.21). 

On utilise alors le fait que S est independant de ft pour ecrire des equations 
d'auto-consistence pour les derives de S. Par exemple, pour determiner S 1 , on ecrit 



S 1 = 



dS OS 
dp ~ dp 

= 0, (8.26) 

ou on a utilise que J*j(0) = qui decoule du fait que a temperature nulle les spins 
sont decouples. 

Pour des ordres plus eleves, on doit proceder ordre par ordre. Pour trouver S k en 



ayant deja calcule S p pour tout p < k, on demarre avec l'equation 



d k S 
d/3 k 



0, egalite 



consequence de l'eq. (8.25). Apres un calcul explicite de la derive, on se retrouvera 
avec une equation du type 



d k S 



dp k 



-S k + Q k = 0, 



(8.27) 



ou Qk est une expression qui depend des magnetisations, correlations et des derives 
a l'ordre p < k — 2 en zero des couplages J*j et des champs A*. En utilisant les 
eqs. (8.17) et (8.18), on peut trouver ces derives des couplages et des champs en 
fonction des valeurs deja connus de S p pour p < k et avoir une expression explicite 
pour Q k et done pour S k . Le resultat final pour S est done 



-E 



1 + TOj , 1 + rrii 1 — rrii , 1 — m 8 
In + In 



+ -P 3 KlrrumjLiLj 



i<j 



6 



i<j 



2 



^ KijLiLj + P 3 ^2 KijKjkKkiLiLjL k 

i<j i<j<k 

^ KijK jk K k iK u LiLjL k Li 

i,j,k,l 

+0(P 5 ) . 



P 4 



(8.28) 
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Les couplages sont donnees par 

4({cfci},W,/3) = f3K iJ -2f3 2 m t m J Kf J -f3 2 J2K jk K kl L k 

k 

+ \(3 3 Kf J [l + 3m 2 + 3m 2 + 9m 2 m 2 } 
+/3 3 K iAK%L 3 + K 2 kl U)L k 

+(3 3 K^KmKuL^ + 0((3 4 ) , (8.29) 

k,l 

et le champs ft, par 

_^3 (1 + 3m 2) J- Kf. m - 2/3 3 m, ^ K l3 K 3k K kl L 3 L k 

j<k 

+2/3 4 mi Y Y KikKkjKjiKuLiLjLk 

i<j k 

+/3 4 m / Y K i L i i 1 + m2 i + 3m ? + 3m « m ?] 

£ ^ + 0(/3 5 ) . (8.30) 

Dans ces expressions on a utilise les notations 

L i = {(a i -m i f) Q = l-m 2 i , (8.31) 
qui est la deviation standard d'un spin independant mi et 

K -A ((^ ~ "0 (gj ~ ™j))p r ,oo 9 ^ 

J\ij — Oij 2 „ — Oij , ^O.OZj 

ou 5jj est le symbole de Kronecker. 

Malheureusement, on a verifie que la qualite d'inference obtenue par cette formule 
se degrade tres rapidement quand on sort de son limite de validite C{ 3 <C 1. Pour 
ameliorer son stabilite, on a remarque que les trois derniers termes de l'eq. (8.28) 
peuvent s'ecrire de la forme d'une serie alterne 

B 2 B 3 B 4 

-^Tr(M 2 ) + y Tr(M 3 ) - y Tr(M 4 ), (8.33) 



ou M est la matrice definit par M i3 = K i3 ^LiL 3 . Comme Ka = 0, on a que 
TrM = 0, done les trois derniers termes s'ecrivent 

^loop = I Tr ( pM - —M 2 + —M 3 - —M A \ 
2 V 2 3 4 J 

= Tr [log(l + /3M)] + 0(/3 5 ) 

= log[det(l + /3M)] + 0(/3 5 ) (8.34) 
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Cette expression peut aussi etre retrouvee comme une consequence de l'eq. (8.9), 
ce qui montre qu'elle correspond a une approximation du type "champs moyen" 
de l'entropie. En outre, si on remplace les trois derniers termes de l'eq. (8.28) 
par cette expression, on voit une nette amelioration de la stabilite de l'inference. 
Cela s'explique car tres probablement les termes d'ordre superieure a 0(/3 5 ) du 
developpement de log[det(l + /3M)j sont contenus dans l'expansion de S, et les 
developper en serie correspond a une serie alternee, divergente pour des valeurs 
moderement grands de Qj. 

Comme l'expression de champs moyen a pu ameliorer considerablement la sta- 
bilite de l'expansion, on a cherche a integrer une autre approximation possible du 
probleme : l'approximation de pairs independants, dans laquelle on examine chaque 
pair de sites i et j comme un systeme compose de juste deux spins. Utilisant cette ex- 
pression et l'approximation de champs moyen decrite dans le paragraphe precedant, 
on obtient 



-spin 



+ 



E [4 spin - s 



l-spin 



Cj<2-spin + loop ^ ^ gtl- 
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+5 loop_I^ log(l _ X 2 LiL . 



s 



l-spin 
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(8.35) 



ou 



rf2-spin 



^fl-spin _|_ gl-spin 




1, 

+ 4 log 


1 + 


Cij 




(l-m i )(l- 


rrij)_ 


+ ^log 


1 - 


Cij 




(l-m,)(l + 


mj)_ 


+ \log 


1 - 






(l+m 4 )(l- 


rrij)_ 


+ \log 


1 + 


Cij 




(l + mi)(l + 


mj)_ 



[ Cij + (1 -mi)(l -rrij)] 

[Cij - (1 -77li)(l + 77lj)] 

[Cij - (1 +rrO(l -rrij)} 
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137) 



Cette expression est equivalente a l'eq. (8.28) (ils ne different que en termes 
d'ordre 0(/3 5 ) ou plus), mais elle est beaucoup plus stable numeriquement. La 
formule pour J*- qui se deduit de cette expression est 



J, 



*(2-spin+loop) 



J ij 



loop 



+ J, 



*2-spin 
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ou 



et 



J, 



*(2-spin) 



In [1 + Kij(l + 77ii)(l + mj)] 



+ -ln [1 + ^(1 -mi)(l -rrij) 



- K^l - mi )(l + rrij)} 
In [1-^(1 + 7710(1 -rrij)] 



(8.39) 



j-loop 



(1 - mf)(l - m?) 



^(M + l)- 1 ].. 



(8.40) 



Inference d'un modele de Hopfield 

Dans la partie precedente, on s'est interesse a un modele d'Ising generalise sans 
preciser la forme des couplages. Maintenant, on s'interesse au cas particulier d'un 
modele de Hopfield. II y a plusieurs avantages d'utiliser ce modele : d'abord, il 
existe d'experiences de enregistrement multi-neurones ou on ne cherche pas a trouver 
des couplages entre les neurones mais d'en extraire des patterns. Deuxiemement, 
on s'attend que diminuer le nombre de degrees de liberte du probleme rends la 
procedure d'inference plus stable. Finalement, le modele de Hopfield peut etre resolu 
analytiquement, d'ou on espere avoir un meilleur controle des erreurs d'inference. 

Notez qu'en principe on pourrais proceder par d'abord inferer la matrice {Jij} 
des couplages pour ensuite la diagonaliser pour trouver un ensemble de patterns. 
Le probleme de cette approche est qu'elle n'est pas optimale d'un point de vue 
Bayesian : les patterns trouves ne seront pas forcement ceux qui maximisent la 
probabilite a posteriori. Cela est particulierement problematique quand la supposi- 
tion de que le systeme qu'on etude est decrit par un modele de Hopfield est juste 
une approximation. 

Dans cette partie on va avoir deux buts : premierement, on va chercher une 
formule permettant d'inferer les patterns en fonction des donnees. Ensuite, on va 
s'interesser a estimer le nombre de fois qu'on doit mesurer le systeme pour avoir une 
bonne inference. 

Premierement, on remarque que trouver les patterns qui mieux rendent compte 
d'une sequence de donnees est un probleme mal pose, car il y a plusieurs ensembles 
de patterns qui peuvent decrire le meme systeme. Supposons pour exemple le cas 
de deux patterns (p — 2) : 

" = ^S>.) • < M1 > 

Si Ton definit un nouveau ensemble de patterns donnee par = Q cos 6 + £f sin 9 
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et = — Q sin9 + cos 9, le nouveau Hamiltonien est 



H = N 



l^^sfl + ^sin^ 



+N 
H. 



l^-^sinfl + ^cos^ 



(8.42) 
(8.43) 



En generate, si on a p patterns, faire une rotation des patterns en p dimensions ne 
change pas le Hamiltonien du systeme. On dit que le systeme possede une invariance 
de gauge. Alors, pour rendre le probleme d'inference bien posee on doit soit lever 
cette degenerescence en additionnant des contraintes pour lever \esp(p— 1)/2 degrees 
de liberte, soit additionner un prior P dans notre probability Bayesienne. Pour des 
raisons techniques, on va se concentrer sur la premiere solution. 

Pour pouvoir developper un methode d'inference, il est necessaire de traiter se- 
parement le cas ou le systeme est dans une phase ferromagnetique et le cas d'une 
phase paramagnetique. Dans le premier cas, on va supposer que les donnees qu'on 
dispose pour faire l'inference sont des mesures des configurations du systeme. On 
suppose de plus que ces mesures sont une realisation de la loi de Boltzmann. Comme 
les minimums de l'energie libre correspondent a des configurations magnetises selon 
un des patterns [Amit 85a] , on suppose qu'on a mesure l\ configurations magnetises 
selon le premier pattern, l 2 selon le deuxieme et ainsi de suite. Dans ce cas, si on 
estime a partir des mesures la correlation entre deux sites, on a 



Cij 



k=i leik 



(8.44) 



k=i 



ou 



m 



et 



^eftanh(m^), 



ml = tanh(m fc 4 fc ), 



(8.45) 



(8.46) 



d'ou on peut aisement inferer les patterns en diagonalisant la matrice C^. Notez 
que cette procedure choisi implicitement un gauge tel que 



tanh(m fc gf ) tanh(m fc '^f) = 0, avec k ^ k' 



(8.47) 
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Le cas paramagnetique est plus interessant, mais plus complexe. D'abord, comme 
on cherche un systeme dans une phase ou il n'est pas magnetise selon aucun pattern, 
on suppose qu'il existe un champs externe suffisamment fort pour que les magnetisa- 
tions rrii soit dominees par le champs externe locale hi. Aussi, pour faciliter les calculs 
qui suivent, on remplace le Hamiltonien habituel du modele d'Hopfield (eq. (8.5)) 
par 

1 P 

# = E E titjfa - tanh h ^ a i - tanh h i) - E hi<Ti • ( 8 - 48 ) 

/i=l i<j i 

Ce Hamiltonien permet de retrouver l'eq. (8.5) par une translation du champs ex- 
terne : 

^ - E E tanh h i • ( 8 - 49 ) 

Utilisant le theoreme de Bayes (eq. (8.10)), on a alors 



p(m\W}) = 



z(m) L p(W}) 



X 



IJexp 



i=i 



I 
N 



E E - tanh - tanh h i) + p E h ^ 



H=l i<j 



(8.50) 



On voit que cette probabilite ne depend que des valeurs mesures des correlations et 
magnetisations 



rrii 



J E a i ' °V = J E ~ m * m i ' 



(8.51) 



donnant 

p(m\W}) 



Po(m) 



zm)) L p(W}) 



exp 



PL 
N 



EE^ 



H=l i<j 



+ 77 E E titj( m i - tanh h i) K - tanh hj) + p l E h - 



fj, i<j 



(8.52) 



Pour optimiser cette quantite par rapport aux patterns £f , il est necessaire 
d'ecrire explicitement un developpement de logZ pour N grand, ce qui est fait 
dans l'Appendice D. La minimisation en soit etant assez technique, elle peut etre 
consulte au chapitre 6. Le resultat final, a l'ordre dominante en N, est 



h® = tanh 1 m,- 



(8.53) 



et 



X » yjl - tanh 2 h~° ' 



(8.54) 
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ou A^ et f M sont, respectivement, le /x-eme plus grande valeur propre de la matrice 
M et sont vecteur propre associe, ou M est donnee par 



Mi. 



1 -tanh 2 hU/l 



tanh 2 h] 



(8.55) 



C'est interessant de noter que cette expression implique que 

A Q -1 



J, 



E 



vfvf 



X, 



v /(l-m 2 )(l-m 2 ) 



mf)(l - m 2 ) 



(8.56) 



qui est exactement l'expression deja trouvee dans l'eq. (8.40) pour le modele d'Ising 
generalise. 

Dans le but de trouver une expression inedite pour l'inference, on a aussi deter- 
mine les corrections sous-dominantes correspondant a l'eq. (8.54). Les details se 
trouvent dans le chapitre 6. Malheureusement, l'expression trouvee est tres sensible 
a des erreurs aleatoires sur les magnetisations et correlations mesurees, done peu 
utile pour des donnees reels. 

Un autre probleme interessant est de savoir combien de mesures on doit faire d'un 
systeme d'Hopfield pour qu'il soit possible d'inferer precisement les valeurs de ses 
patterns. Pour donner une reponse a ce probleme, on a calcule l'entropie de Shannon 
de la procedure d'inference, car il est raisonnable de penser que quand S/N <C 1, 
on peut trouver les patterns avec un erreur faible. On rappelle que l'entropie de 
Shannon d'une distribution de probabilite P defini sur un ensemble Q est 



S = -Y,P(") log PH- 



(8.57) 



Dans le cas de notre inference, la probabilite est donne par le theoreme de Bayes 
(eq. (8.10)) et on suppose que nos patterns ne peuvent valoir que ±^ffi, ou (3 est 
une constante fixe, qu'on associe a une temperature inverse. On traite d'abord le 
cas d'un seul pattern. On a alors 



S[{°\}] 



mazier 



E 



cxp 



X 




X 



-]o g {mo\}W) L ) + (^EE^H 



1=0 i<j 



OU 



l 

N 



1=0 i<j 



(8.58) 



(8.59) 



Notez que S depend explicitement des mesures {(Ji}. Par contre, il est naturel 
d'esperer que pour un systeme assez grand et pour un nombre pas trop petit de 
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mesures, l'entropie va dependre plus des caracteristiques du systeme que des detail 
des mesures effectuees. On s'interesse alors a calculer (S) , ou la moyenne est effectue 
par rapport a la probability de Boltzmann de mesurer les configurations en 
supposant que le systeme que les a genere est decrit par un modele de Hopfield. 

Le comportement de (S) est tres different pour les deux phases du systeme. 
S'il est dans une phase ferromagnetique, l'entropie decroit exponent iellement avec le 
nombre de mesures L. On peut trouver les details du calcul et l'expression analytique 
de l'entropie dans le chapitre 7. Les resultats sont represented dans le graphique 
suivant : 




1Q -20 | , , , 1 , i 

50 100 150 200 250 300 

L 

Figure 8.4: Entropie par spin en fonction du nombre de mesures L pour (3 = 1.1. Notez 
le comportement asymptotique (S) « Ce jL ou 7 = log cosh ((3m). 



Ce resultat montre que le nombre de mesures necessaires pour inferer le pattern 
du systeme est une grandeur intensive du systeme, ie, il reste finit quand la taille du 
systeme tends vers l'infini. 

Quand le systeme est dans la phase paramagnetique, cette situation se modifie. 
En fait, il faut un nombre de mesures L = aN proportionnel a la taille du systeme 
pour pouvoir inferer le pattern. Comme on peut voir dans le chapitre 7, les calculs 
sont aussi plus complexes, et on est oblige d'utiliser des methodes des systemes 
desordonnees (notamment la methode des repliques) pour trouver une expression 
analytique pour l'entropie moyenne. Un graphique illustrant les resultats obtenus se 
trouve dans la Fig. 8.5. 
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I 1 l I l 1 l I l I 

ac 2 4 6 8 

a 



Figure 8.5: Entropie par spin comme fonction de a pour j3 = 0.5. La ligne solide 
correspond a la solution analytique trouve dans le chapitre 7 et les points correspondent 
a des resultats numeriques. 



Finalement, pour le cas de plusieurs patterns dans la phase paramagnetique, on 
retrouve que l'entropie de l'inference de chaque pattern decroit comme l'entropie du 
systeme ou on n'a qu'un seul pattern (Fig. 8.5). 
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Appendix A 

Details of the small-/3 expansion 



Let O be an observable of the spin configuration (which can explicitly depend 
on the inverse temperature (3), and 



(o) = ^E°(W) ^ 

M 



}) 



(A.l) 



its average value, where U is defined in (4.13), and Z = exp(S). The derivative of 
the average value of O fulfills the following identity, 



9(0) 

d/3 



Z ^ 

M 



dO dU 

~dj3 + ~d/3 



u 



1 dZ 
Z 2 ^ 



5>* = 



{<?} 



dO 



+ O 



,dU 

~d/3 



(A.2) 



where the term in Z 2 vanishes as a consequence of (4.16). 



A.l Second order expansion 

Using (A.2) and (4.21) 

d 3 S _ d \/d 2 U\ / (duVYl _ /d 3 U 
d/3 3 ~ d/3 \d(3 2 / + \ \d(3 ) I ~\d/3 3 

(A.3) 

A straightforward calculation gives (where we omit for clarity the notation | and 
the * subscript from and Aj) 



= 



d 2 UdU' 
' v d f 2 <) I 1 



dUX 
dp) 



d 2 U8U 
Wdp/ 



dU 

~d/3 



Ed 2 Jij dJij \ ^ d 2 \i d\i 

W 2 di L ' L ' ^ 



i<j 



dJij dJjk dJki 



d/3 2 d(3 



- 6 E "df^-df 1 ^ 1 ^ 



i<j<k 



d/3 d/3 d/3 



(A.4) 
(A.5) 



+ E(^) 34 ^^ + 6 ES§Sf^ (A.6) 

A^A \ ' / A^A 
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Using (A. 3), the expressions of the derivatives of Aj in p = 0, we obtain 



p. , \ ^ c ij m i m j „ \ ^ ' //••' /,7 

^ 2 " ~ ^ fl-m?) 2 fl -m 2 ) 2 ~ b 2^ (1 - m *)(l - rn?.)(l - ml) 



from which we deduce 
d 3 S 



i<j<k 



dp 3 



4 ^ K^mjmjLjLj + 6 £ KijKj k K ki LiLjL k 



i<j 



i<j<k 



and 



d 2 J,> 



<9/3 2 



= -AmirrijKfj -2 £ K jk K ki L k . 



(A.7) 



(A.8) 



(A.9) 



A. 2 Third order expansion 



The procedure to derive the third order expansion for the coupling is identical 
to the second order one. We start from 



= 



d 4 S 



df3 A \ dp 



d 4 U\ I fd 2 U 

i 



dp 2 



and evaluate each term in the sum: 
'd 4 U 



dp 4 
d 2 U 



dp 2 



/d 3 UdU 
dU\ 2 d 2 U 



dp J dp 2 



dUX 
dp) 



o 



^ dp 3 



'd 3 UdU\ I (dU\ 2 d 2 U 

: WW/ \ w w 



KijLiLj 



dUX 
dP) 
(A.10) 



Ef d 2 Jij 
V <9/3 2 



i<j 



op 3 



(A.H) 

(A.12) 
(A.13) 



= 2 EE^. 



d 2 j, 



hi 



i<k j 



+ E5>. 



<9 2 A ? , 



<9/3 2 



<9 2 J 

L^Lfc + 4 £ K 2 - i) j2 in <">j L < L ., 



1<J 



iW { - 2mi)I " L > ~ 



(D) E«S^ (A.M) 

v p 7 / o »<j 



= £^.(3^ + 1)^(3^ + 1)^ + 3 K 2 K 2 kl L l L J L k L l + 

i<j i<j, k<l (k^i, l^j) 

+ 6 E E ^S^ 3 ™? + !)^^ + 



+ 12 £ KijKjkKkiULjLk [AmiinjKij + Amim k K ik + 4m fc miK fci ] + 

«<i<fc 

+ 3 E 

KijKjkK k iKnLiLjL k Li 



(A.15) 
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Using the results of Eq. (4.9) and (4.10) we can write all the terms above in the 
same form 



Y K l L ^ 



i<j 



12 Y Y K ^ K kj d -^L i L i L k 

i<j k 



Y K l^ m ' m o L ^ 



i<j 



2 7 \ 2 



d 2 J, 



<9/3 2 



2 / ^ l± ^3 



d 2 \ 



i 3 



-3 £ 

i<j,k<l(k^i,l^j) 



K ij K kl L i L j L k L l 



i<j k 



(A. 16) 

-48 KfjKikKkjmimjLiLjLk - 

i<j k 

-12 ^ K i jKj k K kl K li L i LjL k Lx 

i,j,k,i m 

~^YY K ^ K i L ^ L l (A- 17) 

-2 Y Y KfjKjkKkimimjLiLjLk (A. 18) 

i<j k 

A8j2Kt j rn 2 i m 2 j L i L j + 

i<j 

48 Y Y KfjKii. Kijiiiiiiijl.il. jl.i, + 

i<j k 

+6 ^ KijKjkKkiKuLiLjLkLi 

i,j,k,l (^) 

+12 y £Y K ik K i L t L * L 3 (A. 19) 



-24^^1^(1-7^)^ 
-48^^X^.m 2 ^^ 



(A.20) 



«<j A; 



(A.21) 
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Again we find equation (4.34) with 

Q 3 = -J2 K i [(3m i 2 + l)(3m| + l)-48m t 2 m|]L J L 



i<j 

+ 12j2J2 K ^ K i L ^ L l + 3 E KyKfiKuKuLiLsLtU 

i<j k i,j,k,l (ft 

i j i<j 

which gives the fourth order contribution to the entropy, 

= -2 E K i I 1 + 3 ^ 2 + H + 9 ™*H 2 ] L * L s - 12 E E 

i<j i<j k 

- 24 {K tj K jk K kl K u + K ik K kj KijKii + K l3 K jl K lk K ki )L i L j L k I^.2?>) 

i<j<k<l 

and the third order contribution to the coupling, 



d 3 J tj 



= 2Kf j [1 + 3m 2 + 3m 2 + 9m 2 m 2 ] + 6 ^ + K 2 kl L t )L k + 

k (ft, ft) 

+ 6 Kj k K k iK u L k Li . (A.24) 



k,l 
(kftjft) 
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Appendix B 

Large magnetization expansion 



Equation (4.39) suggests that to expand </*• to the order of (Li) k one has to sum 
all the diagrams with up to k + 2 spins. This statement is true if the expansion for 



J,*- is of the form 

Jij = Aij + ^2 LkAijt + ^2 ^2 LkLiAiju + ... (B-l) 

k k I 

where the coefficients are polynomials in the couplings Ki a i and the mag- 

netizations m a (a,/3 < n). In the following we will show that the above statement 
is true to any order of the expansion in (3 by recurrence. First of all, from (4.10) we 
see that if </*• is of the form (B.l) up to the order k, so is A* to the same order. 

As we saw in section 4, to find an equation for , one must evaluate g^t+f • 
Using Eq. A. 2, we can write 

d k+1 S _ I ( d dU\ k du\ _ r /jld^u\ 

where a is a multi-index with |a| = k + 1 and P a a multiplicity coefficient. The 
highest order term of this expression evaluates to LiLjKij-T^f- = 

Due to the structure of U, spin dependence in (B.2) will come either from the 
lower derivatives of J*j (of the form (B.l) by hypothesis), from the derivatives of 
A*, or explicitly from U . In the later case we get a multiplicative factor (<7j — mi). 
Hence we end up with computing a term, with k > 1, of the form 

<(., - mi f) = { -m - m ?) (m+1)t "; ( '- 1)t " (b.3) 

Clearly any term including (<7j — m,) will give a multiplicative factor Lj after aver- 
aging. As spins are decoupled in the /3 — limit we obtain the product of those 
factors over the spins in the diagram as claimed. 
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In this appendix, we used the procedure described in section 4.4.1 to verify all 
the terms of Eq. (4.40). We start by posing 

hi = \ In ([^) " E J * m > + E + h ® + ^ + > (C- 1 ) 

and we will now proceed to evaluate each one of the h\ k \ 

C.l Evaluation of h t ^({J}) 

Using Eqs. (4.41) and (4.42), we have 



(3) 8 d\$F) 



^ dmi d/3 3 

= - 4 E4^ [m*(l - - mj)] 

-6 JijJjkJki^ [(1 - m?)(l - m 2 )(l - rr£)] , (C.2) 

i<j<k 

which can be simplified to 
6/i{ 3) = " 4 ^ 4^/(1 - m?K-(l - m 2 ) - 25 il m 2 m j (l - m 2 ) 

i<j 

+ 5ji(l - rrij)mi(l - mf) - 25 j im 2 j m i (l - m-)] 
+ 12 ^2 JijJjkJki[5umi(l - m 2 )(l - m 2 k ) + S^m^l - m 2 )(l - m 2 k ) 

i<j<k 

+ 5 k im k (\ - m 2 )(l - m 2 )] . 



(C.3) 
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Finally, we get 

Q h f) = _ 4(1 - m 2 ) 4m(l ~ ml) + 8m 2 ]T J>,(1 - m 2 ) 
+ Ylmi ^ JuJijJji(l ~ m 2 )(l - m 2 ) . 

C.2 Evaluation of ^ (3) ({c}) 

We will now use Eq. (C.4) to verify the terms of order [3 3 of Eq. (4.40). We start 
with Eqs. (C.l) and (C.4) 



hi({Jij}) 



\ ln ( TT^T ) " ^ JijlJlj + ^ J i m ^ ~ m ? } 



m) Hi*) 

+ 2m, ^2 JijJjkJki(l-m?)(l-ml). 

j<k(j,k^i) 

Using Eq. (4.39), we pose J tj = - 2K? j m i m j - T. k K jkK ki {\ - m 2 k ) + 0(c 3 ), 
yielding 



— m„- 



(C.5) 



hi({c}) 



1 A-m, 

2 V 1 + m t 



- 2K 2 j m i m j - ^ K jk K ki {l - m 2 k ) 



a -ml) 



+ ^3 m 

i(^) 

- ^(1 - m 2 ) Kfjm^l - m 2 ) + ^m 2 ^ Xjm,(l - m 2 ) 
+ 2m, ]T ^^{1 - m 2 )(l - m\) + 0(c 4 ) . 
Simplifying this expression, we get 

~ V ?y i(^) j(#0 

-4^Xjm^( 1 -^ 2 )- 

-2^2K ij K jk K ki m i {\ -m 2 k )(l - m 2 )- 



(C.6) 



(C.7) 



- |(1 - m 2 ) £ Xjm,(l - m 2 ) + ^m 2 £ Kjm,(l - 

+ 2m, ^ K i:j K jk K ki (l - m 2 )(l - m 2 ) + 0(c 4 ) , 

i<fc 



m„- 
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and finally 



HUij}) = - - In f — — - J - J H m j + Yl K l m ^ 1 ~ m ?) 



- hi + 3m, 2 ) K* mj (l - m)) (C.8) 

- 2m t K tJ K jk K kl (l - m 2 )(l - m\) + 0(c 4 ) . 

j<k 



Which matches exactly Eq. (4.40). 
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Appendix D 



Evaluation of log Z 



Using Eq. (6.24) and doing an integral transform, we get 

+ ~7^J2J2 ^ ~ tallh ^) f ' 

H=l v ^ /i=l v fj, i 



:f tanh ft,,- 



+ log 



2 cosh 



^X>,e +>».)]} 



(D.l) 



which is just Eq. (2.34) supposing that the magnetizations m M are 0(1/ yN). Doing 
a Taylor expansion of this equation for large N, we have 



exp 



+ 



log(2 cosh h 

i 



exp 



2ViV 



1 1 



12iV N 
+0(1/N 2 ) 



\J2 m ^i) (l-3tanh 2 /i i )(l -tanh 2 ^) 

i \ fi J 



where 



and 



X, = l-^E^l-tanh 2 ^), 



J jJLV 



^E^(l-tanh 2 ^) 



a/TV ^ 



(D.2) 



(D.3) 



(D.4) 
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We use the expansion of e x — 1 + x + ■ ■ ■ to write this equation as a gaussian 
integral 



Z = exp 



log(2 cosh hi) 



2tt 



i + 4=E 

2^^ 



1 1 



12JVJV 



X (j2 m ^ t 1 - 3tanh2 w - tanh2 ^) + °(w 3/2 ) 

i \ f / 



• exp 



(D.5) 



which can be rewritten as averages in respect to a gaussian distribution 
Z = exp 



log(2 cosh M - 9 XI log 

i 

^2\^2 m ^i) (1 -3tanh 2 /ii)(l -tanh 2 /ii 

i \ jx J 



4iV ^ s % m l m l 



1 1 



12NN 
+0{1/N 3/2 ) 
Those averages can be easily calculated 



(D.6) 



\ I m li+v 



X^Xu 



(D.7) 



E<(^) 4 ) +3(EmX(er) 2 (er) 2 ) + odd terms 



AM4 



3E%+sE 



V 



(gag 



(D 



Finally, we pose 



^ E(^) 2 (en 2 (l - 3 tanh 2 h^l - tanh 2 ^) , (D.9) 



and 



s M/1 = . 



(D.10) 



124 



APPENDIX D. EVALUATION OF logZ 



which gives an explicit form for log Z: 

\o g Z = J2 Iog(2 cosh h^-\j2 lo § X, + ± E (1 " M ^"^ a + 0(l/iV 3 / 2 ) , 

(D.ll) 

where is the Kronecker symbol. 
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Appendix E 



Evaluation of m for the entropy of 
the Ising model 



Starting with Eq. (7.13): 



mi 



= ^ a i tanh (p m ^ ' 



which under the hypothesis of m ( = m can be rewritten as 



m = 



^ a i tanh (p Yl msCT ^j \ ' 



V a 1 tanh I 0m V a 1 ] TT 
W V i J f = \ 2cos HP m *) 



(E.l) 



(E.2) 



This last equation can be simplified using the variable change a 1 — > a l i; 
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V a 1 tanh \ 0m V(7 ! TT 

V , / f 1 2cosh(/3m*) 
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2L [2 cosh(/3m*)] J 



^ 2 cosh(/3m*) 
^ ^ <r' j tanh ^ cr f j exp ^m* ^ <t' j 



(E.3) 
(E.4) 



+ y^ "' I tanh /^m^fj' | cxp 

M V i / V i 

Using the global symmetry a 1 — > —a 1 : 
1 



0m*J2 



a 



i=i 



(E.5) 



m = 



M9 U r ^EE ffl ta T ffl ^' C0 T ffl, E ffl 

L [2 cosh(pm*)] 1 ' • — ! 
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We verify the solution of this equation is m = m*\ 

m = r [ a 1 | sinh [ Bm Y^ a 1 | 

L[2cos^m)\ L J V i ) 

= r -S— ^ cosh | Bm a 1 ] 

PL[2cosh(Bm)] L dmj^ V / / 

= 1 rw~ [2cosh(/3m)] L 

8L[2cosh(pm)] L dm 

= tanh(^m). (E.6) 
From which follows the result, since m* is defined as the solution of m* = tanh(/3m*). 
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Evaluation of N in the 
paramagnetic phase 



We start with Eq. (7.20): 



N = 



yr dmi 



f3N 



E m *+ 



1=1 



1=1 i 

which after averaging yields 

in j i=i P =i v ^" 
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x ( exp < /3JV 
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{§} " i=i P =i 

AT 



exp 



+ ^E'<^E«M + £EE"H« 



N 



f3N 



i,p * ; i<j 

Doing an integral transformation and making the sum over a, we obtain 



(F.l) 



(F.2) 



/3iV 



E «) s 



E (^) 2 + E ln 2 cosh ( p E m ^ + ^ ) 

j i,i V p / 



• (F.3) 
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Applying the variable change 4m 2 — > m 2 , 4m 2 — » m 2 , we have 



N r 



-0Ln/2 



drrii drfii 



E/ TT ' 
. J \^v^v^ 



exp 



-JEW 2 - \^2^if 



i,l 



+E ln2c H V£E<^+\/^£ 



(F.4) 



Since that for small x, In [2cosh(x)] = ln(2) + ^ + 0(x 4 ), we can approximate 
this expression by 



A>\ ~ e- pLn ' 2 / II 



dm^ drrii 
2tt v 7 ^ 



exp 



+ ln in 2 + + § E(™<) 2 + E & 



Jew 2 - JE(™<) 2+ 



~ [det M]- L/2 e~^ 2 , 

{£},{£} 



Z,p,i 



(F.5) 



where M is the matrix 

/ i-t - 

v 7 ^ el. 



Si 4"i £i TV Si 4"i 6 



AT 



Z—ii Sj S« jv Z^i Sj Si 
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JV ^—^i s « s i 

. A V £ 2 £ n 

JV Z_/i Si Si 



JV Si 4"i 4"i 



JV Z^i Si Si jv Z^i Si Si 



Z^i 4"i 4"i 



• (F.6) 



JV Z-,i 



/ 



Fixing q pa = ^ Ei sTsT and t p = ± £\ fffi wi th the Lagrange multipliers g p(T 
and t p , we have 



jV r 



/II d ^ E ex p ^ ■ 



- - log det M - - q pa q pa 



P<<7 



a v ^ - a v ^ 1 x ^ „ a x ^ - 1 ~ 

p<a 



1 ~ „ a/3n 



2 ^^jV 

p 



(F.7) 



Since now the sites are completely uncorrelated, we can write this expression in a 
form solvable by the saddle-point method: 



/ IJ dg pa dq pa dt p di p H N , 

p<o 



(F.f 
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where H is given by 
H = 



E ex p 



log det M - a E q pa q P a 



- f E +«E «t + f E tit" - 



p<<T 



a(3n 



(F.9) 



We will look to the replica-symmetric saddle point of H. Posing q pa — q, t p — t, 

q P a = q and t p — t, we have 



H 



E exp 



an - 



« i , , , an(n — 1) „ 
--logdetM y — J -qq 



a(3n 



rjiy ex 
7-oc r trf* exp 



z 2 a , , , , «n(n — 1) ^ 

log det M ^ -go 

2 2 s 4 yy 



Since we can do a variable change £ p — > this expression evaluates to 



(F.10) 



if 



/•°° d^ r 

an 



z a 



exp < log det M — 



tt + n log 



2 cosh 



at 



+ z\/ aq 



an(n — 1) 
2 

a/3n 



95 



(F.ll) 



Now we need to evaluate explicitly det M in the replica-symmetric hypothesis: 
detM = (1 - (3 + faf 1 x 



x 



;i -p)(l-/3)-(n- 1)(1 - ^)/3g - n^t 2 



log(detM) = log(l -0) +nlog[l - j3(l - q)} 



n(3 



t 2 (3 



q +0(n 2 ). 



l-(l-q)/3\l-/3 
After a small calculation, we find the saddle-point equations 

^tanh ^— t + z^/a^ 



t 

q 

i 



= ^ tanh ^ 



a 



2 

2t/3 2 



i + ag 



(l-/3)[l-(l-g)/3]' 
g(l-/3)+t 2 /3 

(1-/3) [1 - (1 



2 ' 



(F.12) 
(F.13) 

(F.14) 
(F.15) 

(F.16) 

(F.17) 



131 



APPENDIX F. EVALUATION OF N IN THE PARAMAGNETIC PHASE 



where the average (•) is in respect to the gaussian variable z of zero mean and 
standard deviation a = 1. 

Finally, the entropy can be written as 



(S) = 



ap(l -q){l-0[2- 2q(l -/?)-(! + t 2 )$]} 



a 



2(1-/3) [l-{l- q )py 



ln[l 



+ (ln 



a 



2 cosh ( —t + z-\/aq 



a 
2 



[(l-q)q + tt] . 



(F.18) 
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Entropy calculations details for 
continuous patterns 



Our starting points are Eqs. (7.32) and (7.35): 
N[W},P] =JdxJdmJH dfcexp (|EE^W 



1=0 i<j 



+ m 2 — f3L l°g [2 cosh(m^)] 

i 

+ /31ogP ({&}) -iNxm + i^&t&vhimti) ) . 



(G.l) 



In the same way as with continuous patterns, we will use the replica trick to 
evaluate logiV 



v v v=\ i 



n L 



I E E E + E log p o({er» 



j<j ^=1 Z=l 



!/=l 



+ ^ E( m ') 2 - ^ L E E lo S t 2 «*h(m"0] 



!/=l 



I/=l 



v v i ) 



(G.2) 



and we will average the entropy with respect to all possible realizations of the mea- 
surements taken from a system where the patterns are given by {&}: 



i=i i<j 



(G.3) 
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Finally we average in respect to the underlying patterns 



N [{*},?]") = 



[ II d ^ E vTT^ 1 *^ w exp w E E iMt + 1^ Po&) 

J i {a} Z U4i}J [ JV l=1 iKj 



(G.4) 



Note that we used the same function -Po ({£«}) both as the prior and as the distribu- 
tion of the "real" patterns {^}. It is equivalent to say that one knows the statistical 
distribution of but wants to infer one particular realization of this distribution. 
Writing the average explicitly and doing a gaussian transform, we have 



N[{a},P] n ) = 



n L 



dQ\ 



n 



dQ 



x 



x 



n d ™") (n 7=) J= ^ e ex p(^) > ( g - 5 ) 



where 



H 



m E E^n 2 - f E(^) 2 + p E E E w*i 



2 ^^w./ 2 

i^=i z=i z=i 



i/=l z=i 



E E + E lo § p (^) + lo § p (^) 



E( m ') 2 + -^r™ 2 - ^ L E E lo s P cosh (™^)] 



i/=i 



n 

L log |^2 cosh(m|j) — iN rr^m" — iNyrh 



+i E E tanh ( m ^D + *y E & tanh ( m & 



(G.6) 



y=l i 



We now suppose that the prior probability of the patterns are independent and 
identically distributed across the sites. Mathematically, that means that -Po({£«}) = 
riiPo(Ci)- Under that supposition we can make the sites decoupled, so we have 



dQi 



v=\ 1=1 

dx^ 



2tt 



X 



K u=l 



X 



n 

KV=1 



dy 



dmA N exp(NB) , 



2tt/ V27T 



(G.7) 
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where 



n L 



B = -§ ££«o 2 - JEW-) 2 +^ £(™o'- 



I/=l «=1 



I/=l 



+—m —tyxm — vym , 



i/=i 



W 1 } 



^g^V-Llog(2cosh(mO) 



tanh(mf) + log C v 



u=l 



d£ exp 



/?X]Qr^+/3iogpo(o 



1=1 



-/3L log(2 cosh(m^)) + i^t tanh(m"f ) 



(G.8) 



(G.9) 



(G.10) 



Like we did previously, we look for the replica-symmetric saddle-point of this 
integral 



Qi 



m 



I 



d£po(0£tanh(m£) = (£tanh(m£)> , 



(G.11) 



m = 



Qk 



d £ exp 

w 



m^a' — L log 2 cosh(mf) 



x 



x- 



= / 

J W} 



exp 



m^a' — L log 2 cosh(m£) 



i=i 



x 



X 



/ d&o(Of ^ ex P [Ef=i Q l(yl i - ^ log 2 cosh(m£) 



/ d &o(0 exp X)f =1 QV£ - L log 2 cosh(m£) 



/ cl^ (0^ tann ( m O ex P 
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/ d£p (O ex P 
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(G.12) 



(G.13) 



135 



APPENDIX G. ENTROPY CALCULATIONS DETAILS FOR CONTINUOUS 
PATTERNS 



Finally, we have 



1 

N 



2 



/3Lm 2 



1=1 



W 1 } 

x log<( / d£exp 



L 

fnl^^a 1 — Llog(2cosh(m£)) 



i=i 



x 



' ]T Q£a l - (3L log(2 cosh(mO) + P logp (0 



(G.14) 



from which it is straightforward to evaluate the entropy using Eq. (7.33). 
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Details of the evaluation of the 
entropy for the Hopfield model 



Starting with Eq. (7.48), we have 
(N n ) = 



^vE^EE E ex p{f EEE^^I 



j-2,v j-2,v\ 
Si S_i J 







iL^iogz Hop ueue> v ) +^EE^(^i+i© 



i/=i 



=1 »,j 



-LlogZ„ op 0, {£'}.{?} 



= eeee / in 



dm] 



2n 



3n+f3)N . 



n 

Lis 



(H.l) 



where 



ft = (f»+^EE "few + £ E E E + f EE 



/3n + /3 



, E(^) 2 -^E(^) 2 -^E^-2io g2 



log^Hop [A tf 1 }, {I 2 }] - ^ E^Hop [ft tf 1 }, tf 2 '"}] (H.2) 



N ° L J ' J J /?iV 
Making the sum over {a}, we obtain 
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HOPFIELD MODEL 



where 



Es = ^log[(/3n + ^)iV] + ^log(/3iV) + ^log(^iV)-21og2 



+ ^EE lo S 2cosh 



i 1 



{fin + Mm] + ^™i V + 08** 



/3n + /3 



~ Ek 2 ") 2 - f E™ 2 - £ lo ^H OP [p, m, {I 2 } 



Li/ 



PL 



^iogz Hop [ft ft 1 }, ft 2 '"} 



(H.4) 



Using the following change of variables, 



m] — >■ m* + <5m; / a/]V 
m^ ->■ rhi/VN 



m 2,u _^ m^/y/N 



we have 



w = EEE/dl 



d<5m; 



2tt 



2,1/ 



2tt 



J-J 1 , X E 1 



2tt 



E A = C 1 + ^^^log2cosh 



(/3n + M^ + /3n + /?fl 



N 



ft f2,y 2,1/ . P J2 ~ 



(H.5) 



—L log 2 cosh ((3n + (3)m 



^ 1 ^o 2 - ( ^ + / J )m 'E*">- 



2 AT 



il E('"?'T-|^E*?- 1 



/9m 



*2 



2 A^ 



*2 



2AT 1 _ _ m *2) z_ 



E 



2 AT 1 - 0(1 - m *2) 



E ^ 



(H.6) 



where 



Ci = ^log (0n + 0)N 



Lri L ~ 

— log(PN) + — log(pW) - 2 log 2 + L(/3n + /3)m* 



-L (l + ^ log 2 cosh(£m*) + + l0 § [l " ftl - ™* 2 )] 



+L log 2 cosh ((3n + (3)m 



(H.7) 
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Posing m* = tanh (j3n + (3)m* and expanding the first term for large N, we get 

i 



l V 



2 



l-0(l-m- 2 



+ (i _ m* 2 )__ EE m ^ m ^ 2 '^ 
+(i-^* 2 )/3^E^ m ^ v 



i,l,v 
~.2s 



+/3(/3n + #(1 - m* 2 )^ ^mf'^m,^ 

+ (1 - m* 2 )P(Pn + ^)^E 

u 



2^1-^(1-^ 

,*2 



2^1-^(1-771*2) ^ 



E 



— V^'" 



(H.8) 



One may note that this expression is linear on I, so it can be rewritten as 



cn=EEE n 



-I L 



2tt / V2tt 



(H.9) 



We define 



t = — V^ 2 ' M 



2,fi 



N 

i 



and we do the gaussian integral in (H.9): 

L (3m* 2 u 2 



( Nn ) = EEE ex p 
m { | 2} {e , v} 



Ci 



L /3m* 2 



E 



2 1 - 0(1 - m* 2 ) 2 1 - /3(1 - m* 2 ) ^ 



- - log det M + -A*M _1 A 



(H.10) 
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where 
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Qn-l,r, 



(H.11) 



A = ( y/N(m* — m*)(/3n + f3) m*/3u m* fci m*/3s 2 • • • m*(3s n ) , (H.12) 



and 



d 1 

d 2 

d 3 
u v 
U 



= (f3n + f3) 1 - (f3n + /?)(! - m* 2 ) 



= (3 l-/3(l-m* 2 ) 







l-/3(l-m* 2 ) 



= -(1 -m* 2 )P(f3n + P)u u , 

= ~(l-m* 2 )PPt u , 

= -(l-m* 2 )P(Pn + /3)s u , 

= -(l-m* 2 )p 2 q^. 



(H.13) 
(H.14) 
(H.15) 

(H.16) 
(H.17) 
(H.18) 
(H.19) 



We can now add the Lagrange multipliers 



ee e /n d ^n d ^n d ^n d ^n ^dudue^o) 

J2 s l-^ lo S det M + ^A l M- l A 



l y {in « : 

Ci ct (3m* 2 u 
_ 



2l-/3(l-m*) 2i_/3(i_ 



2 E ^ - ^ E s ^ - 2 E * A - 3-7^ + 2^ E 

p,a V p p - 1 j,p,CT 



l,p l,p I 

and we have finally the expression presented in chapter 7. 
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Details of the evaluation of (N) for 
the Hopfield model with large h 



We start with 



s = - E p ti?h mw 1 }] logptin, miW}] , 



(i.i) 



and average this entropy in respect to both the first pattern and the measured 
configurations. The probability is given by 



p[W}\ie},{e}]p[{e},{e}} 



WW] 



(1.2) 



zn oP [PAe},{e}] L 

L 

i 1=1 



exp 



I 
N 



/i=l,2 J=l «<j 



(1.3) 



The normalization M is given by 



AA[{^}] 



^WMeue^ xp 

L 

+/*EE^ 

i 1=1 



I 

N 



M=l,2 Z i<j 



(1.4) 



As done in Section 7.1, we write our entropy as a derivative of a modified normal- 
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ization N: 



N[{a 1 }} = 



P 



I 

N 



(1=1,2 I i<j 



i 1=1 



S = 



iLio g z nop [p,{e},{e},W}] 

dhgN 



P 

logiV 



P=p d/3 



where we have supposed P^ 1 }, {£ 2 }] = 2 



-2N 



LI Determination of log Zh op 

To write an explicit expression for N, we have to evaluate Zn op : 
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i Px 
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where 



+ ^ log [2 cosh(/^)] + 0(1/^) I , 



Finally 



log Z Hop = lo § t 2 cosh(^)] - log {fix) + |^(<7i + <7 2 2 ) + 0(1/V^V) . 



(1.5) 
(1.6) 



(1.7) 

(1.8) 
(1.9) 

(1.10) 
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1.2 Determination of (log N) 



To determinate ( log AM , we use again the replica trick, 



{i 1 } m W it 1 -"} u 2 -n I < 1=1 
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n L 



E E E ^(c 1 ^ + rr ) - ^ E lo s ^ho P [p, {en, 
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+ p E E + gg) - l bg z Hop {e 1 }, ii 2 }} 
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(1.11) 



After doing an integral transform, we obtain 
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Summing over the spin variables a yields 



m {i 2 } ti 1 '"} 



n%nn 



dm 2 ,„ \ ( dm 1 dm 2 \ NEa 
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Doing a Taylor expansion and using the result for log Z Hop we have 
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—m x m 2 Hii I 1 - tanh 2 [(/3n + 0)hi 



Several of these terms can be ignored: first of all, £ £\ 1 _ taim 2 [(/3 n + 0)/^ 

can be considered of order 0(l/y/~N), since we suppose that the patterns are or- 
thogonal in the leading order. The same thing can be said to the inferred patterns, 
so ^ J2i & v & p 1 - tanh 2 [(/3n + 0)^]] = 0(1/ VN). Finally, since we can permute 
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I; 1 -H- £ 2 , we will only lose one unity of entropy if we suppose that is an approxima- 



tion to and not £ 2 . We can consider thus ^iCf'^f 1 ~~ tanh \{(3n + (3)hi 

0(l/y/N) for /i ^ //. 

After neglecting these terms, we can see that the two different patterns are 
completely decorrelated in N, with no extra term accounting for an effective influence 
of one pattern over the other. We can hence conclude that the results taken for the 
Mattis model can be applied for this system. 
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